Senior MLOps Engineer - Machine Learning Infrastructure

Jobgether United State
Remote
This Job is No Longer Active This position is no longer accepting applications
AI Summary

Design, build, and scale robust machine learning infrastructure for advanced AI systems. Lead development of end-to-end ML pipelines and collaborate with AI researchers and ML engineers. Implement best practices for CI/CD, model monitoring, and distributed training.

Key Highlights
Design and develop end-to-end ML pipelines for model training, evaluation, and deployment
Collaborate with AI researchers and ML engineers to productionize models
Implement CI/CD best practices for ML systems and monitor model health and performance
Technical Skills Required
Python C C++ Bash Docker Kubernetes AWS GCP Azure GitHub Actions Jenkins MLflow Weights & Biases Kubeflow
Benefits & Perks
Competitive base salary range: $190,000-$230,000 annually
Fully remote work with occasional travel
Comprehensive health, dental, vision, life, and disability insurance
Generous paid time off and company holidays

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior MLOps Engineer in United States.

This role offers a unique opportunity to design, build, and scale robust machine learning infrastructure that powers advanced AI systems. You will lead the development of end-to-end ML pipelines, from research to production, ensuring models are reliable, observable, and performant at scale. Working at the intersection of software engineering, cloud infrastructure, and machine learning, you will collaborate closely with AI researchers, ML engineers, and software teams to productionize models efficiently. This position provides autonomy, ownership of critical ML systems, and the chance to implement best practices for CI/CD, model monitoring, and distributed training. You'll contribute to a culture of innovation, growth, and continuous improvement, with opportunities to influence how AI capabilities are deployed across large-scale systems.

Accountabilities:

  • Design, develop, and maintain end-to-end ML pipelines for model training, evaluation, deployment, and agentic AI workflows
  • Build and optimize infrastructure for distributed training and scalable model serving across GPU and cloud environments
  • Implement tools for data creation, model versioning, experiment tracking, and automated retraining
  • Collaborate with AI researchers and ML engineers to productionize POCs and ensure model reproducibility and scalability
  • Apply CI/CD best practices for ML systems, including continuous integration, automated testing, and deployment workflows
  • Monitor model health, performance, drift, and data quality in production environments
  • Partner with engineering teams to streamline infrastructure provisioning, data access, and cost optimization for large-scale model training
  • Contribute to internal documentation, best practices, and mentorship for ML operations and infrastructure standards


Requirements

  • 6-10+ years of experience in software or ML engineering, with at least 3+ years in MLOps or ML infrastructure
  • Strong proficiency in Python, C, C++, Bash, or similar programming languages
  • Proven experience deploying and managing ML models in production environments
  • Expertise with Docker, Kubernetes, and scalable ML system design
  • Experience with cloud platforms such as AWS, GCP, or Azure and GPU orchestration
  • Hands-on knowledge of CI/CD pipelines (GitHub Actions, Jenkins, or similar)
  • Familiarity with MLflow, Weights & Biases, Kubeflow, or other experiment tracking and pipeline automation tools
  • Solid understanding of data versioning, model reproducibility, and monitoring strategies
  • Excellent problem-solving skills and a collaborative, team-oriented mindset
  • Bonus: experience training models from scratch, model optimization techniques, infrastructure-as-code tools (Terraform, CloudFormation), distributed systems, or contributions to open-source projects


Benefits

  • Competitive base salary range: $190,000-$230,000 annually, based on skills and experience
  • Fully remote work with occasional travel 1-2 times per year for company-wide or departmental meetings
  • Comprehensive health, dental, vision, life, and disability insurance
  • Generous paid time off and company holidays
  • Flexible work environment and opportunities for professional growth

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.

🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.

📊 It compares your profile to the job's core requirements and past success factors to determine your match score.

🎯 Based on this analysis, we automatically shortlist the 3 candidates with the highest match to the role.

🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.

The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role.

Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team.

Thank you for your interest!


Subscribe our newsletter

New Things Will Always Update Regularly