Senior MLOps Engineer - ML Infrastructure & Deployment

silversearch, inc. • United State
Remote
Apply
AI Summary

Design, deploy, and operate production ML infrastructure across Dev, QA, and Prod environments. Manage ML deployment pipelines and runtime operations in AWS SageMaker. Implement monitoring, observability, and governance for large-scale multimodal AI workloads.

Key Highlights
Production ML infrastructure deployment and scaling
AWS SageMaker pipelines and endpoints
GPU/CPU infrastructure optimization for inference
Monitoring, alerting, and drift detection
Fully remote with East Coast collaboration hours
US work authorization required
Key Responsibilities
Design, deploy, and operate production ML infrastructure across Dev, QA, and Prod environments
Manage ML deployment pipelines and runtime operations in AWS SageMaker
Configure and optimize GPU/CPU infrastructure for large-scale inference workloads
Implement monitoring, alerting, drift detection, and observability for ML systems
Build deployment governance processes including rollout, rollback, and recovery strategies
Support high-throughput ML workloads across text, image, and video pipelines
Optimize infrastructure scalability, cost efficiency, and operational reliability
Partner with ML Engineers and Data Scientists to operationalize new models and workflows
Implement A/B testing and controlled rollout strategies for production ML systems
Technical Skills Required
AWS SageMaker ML deployment pipelines ML endpoints Multi-environment deployments Containerized ML deployment PyTorch inference TensorFlow inference Autoscaling Infrastructure optimization Runtime reliability Monitoring and observability frameworks Distributed ML workloads NLP systems Computer vision systems Semantic/vector search infrastructure ANN/vector indexing approaches Large-scale text, image, and video processing
Benefits & Perks
Fully remote
East Coast collaboration hours
US work authorization required
Nice to Have
Experience supporting NLP and computer vision ML systems
Familiarity with semantic/vector search infrastructure
Experience with ranking/reranking systems
Familiarity with ANN/vector indexing approaches
Experience supporting large-scale text, image, and video processing pipelines
Experience optimizing GPU-based infrastructure

Job Description


About the Opportunity

Our client, a globally recognized media and information organization, is building out the operational foundation for large-scale production machine learning systems supporting enterprise intelligence products.


This role focuses on deploying, operating, scaling, and governing ML infrastructure and inference services across multimodal AI workloads involving text, image, and video processing. The environment is highly technical and collaborative, with a strong emphasis on production reliability, scalability, observability, and AWS-based ML infrastructure.


What You’ll Be Doing

  • Design, deploy, and operate production ML infrastructure across Dev, QA, and Prod environments
  • Manage ML deployment pipelines and runtime operations in AWS SageMaker
  • Configure and optimize GPU/CPU infrastructure for large-scale inference workloads
  • Implement monitoring, alerting, drift detection, and observability for ML systems
  • Build deployment governance processes including rollout, rollback, and recovery strategies
  • Support high-throughput ML workloads across text, image, and video pipelines
  • Optimize infrastructure scalability, cost efficiency, and operational reliability
  • Partner with ML Engineers and Data Scientists to operationalize new models and workflows
  • Implement A/B testing and controlled rollout strategies for production ML systems


Required Qualifications

  • Hands-on experience deploying and operating ML systems in production
  • Strong AWS SageMaker experience, including:
  • Pipelines
  • Endpoints
  • Monitoring
  • Multi-environment deployments
  • Experience with containerized ML deployment and orchestration
  • Experience operating PyTorch and TensorFlow inference systems
  • Strong understanding of autoscaling, infrastructure optimization, and runtime reliability
  • Experience implementing monitoring and observability frameworks for ML systems
  • Experience supporting distributed ML workloads in cloud environments


Strongly Preferred

  • Experience supporting NLP and computer vision ML systems
  • Familiarity with semantic/vector search infrastructure
  • Experience with ranking/reranking systems
  • Familiarity with ANN/vector indexing approaches
  • Experience supporting large-scale text, image, and video processing pipelines
  • Experience optimizing GPU-based infrastructure


What This Role Is

  • Production MLOps Engineering
  • ML Infrastructure & Deployment
  • Runtime Reliability & Scalability
  • AWS ML Operations
  • Monitoring & Operational Governance


What This Role Is Not

  • Pure DevOps
  • Model Architecture Design
  • Data Science Ownership
  • Research ML Engineering


Additional Details

  • Fully remote
  • Preference for East Coast collaboration hours


Applicants must be legally authorized to work in the United States and must not require employer sponsorship now or in the future.


Similar Jobs

Explore other opportunities that match your interests

AI/ML Engineer

Machine Learning
•
2h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

devengine.ca

United State

MLOps Engineer

Machine Learning
•
1d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

sundayy

United State

Senior Machine Learning Engineer

Machine Learning
•
1d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Attis

United State

Subscribe our newsletter

New Things Will Always Update Regularly