Implement and optimize enterprise-grade GenAI and MLOps infrastructure using AWS and Azure. Develop automated logging and observability tools. Collaborate with cross-functional teams to deliver cloud AI solutions.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Senior Cloud ML Engineer – GenAI & MLOps
Contract-to-Hire (W-2 with Benefits)
100% Remote (CST Work Hours)
Our Fortune 50 healthcare client is seeking a Senior Cloud ML Engineer to implement and optimize enterprise-grade GenAI and MLOps infrastructure. This role is part of a new AI Shared Services team and works directly with the Lead Cloud ML Engineer to build secure, scalable access to cloud AI/ML services from AWS and Azure. You will focus on hands-on development, integration, and deployment of AI platform components that enable standardized MLOps/LLMOps capabilities across the organization.
Responsibilities:
- Build and configure components of the AI Shared Services platform supporting cloud AI/ML/GenAI services from AWS and Azure.
- Implement features for the AI Gateway to standardize MLOps/LLMOps frameworks, centralize model access, enforce governance, and enable usage tracking and cost optimization.
- Develop automated logging of model inputs/outputs to Databricks Delta tables via Unity Catalog for observability and compliance.
- Apply guardrails for PII protection, prompt injection defense, harmful content filtering, and rate limiting to ensure security, compliance, and cost control.
- Configure observability, logging, and monitoring tools to ensure reliability and auditability for ML, LLM, and GenAI workloads.
- Deploy and manage cloud-native inference infrastructure using AWS SageMaker, including containerized services that provide ML/LLM models through scalable APIs.
- Integrate AI infrastructure components into CI/CD pipelines (GitLab CI, GitHub Actions, CodePipeline) for automated deployments.
- Collaborate with Cloud Platform Engineering, Data Engineering, Data Science, and Security teams to deliver cloud AI solutions aligned with enterprise standards.
- Contribute to technical documentation and execution of best practices for MLOps and LLMOps.
Requirements:
- Bachelor’s degree in Computer Science or Data Science required; Master’s preferred.
- 10+ years of experience in cloud platform engineering and ML engineering.
- Hands-on experience implementing MLOps/LLMOps frameworks and integrating AI services into enterprise cloud infrastructure environments.
- Experience provisioning, configuring, and integrating cloud AI/ML services into an enterprise, specifically AWS Bedrock or Azure AI Foundry.
- Experience building cloud-native AI/ML services, including SageMaker pipelines and inference endpoints, and integrating them into CI/CD pipelines.
- Experience deploying and scaling LLM and GenAI workloads as cloud infrastructure components within MLOps pipelines.
- Experience building and deploying Infrastructure as Code using Terraform or CloudFormation.
- Experience with Databricks, Delta Lake, and Unity Catalog for data governance and observability.
- Experience implementing security and compliance controls to protect PII and regulated data within AI/ML workloads.