Senior ML Engineer, Ad Tech Productionization

yugen.ai • India
Remote
Apply
AI Summary

Seeking an experienced ML Engineer to productionize recommendation and ranking models for a leading ad tech client. This role involves owning infrastructure, low-latency APIs, data pipelines, and MLOps for high-availability services. Requires strong Python, cloud (GCP/AWS), data engineering, and MLOps skills, with a focus on async collaboration.

Key Highlights
Productionize ML models (Python) into low-latency, high-availability services.
Build and maintain feature pipelines (batch/streaming) and potentially an online feature store.
Implement MLOps practices including CI/CD, containerization (Docker/Kubernetes), and monitoring.
Collaborate closely with Data Scientists and client teams, focusing on async communication.
Support experimentation platforms and ensure robust monitoring and reliability.
Technical Skills Required
Python GCP AWS Airflow dbt Beam Spark Kafka Pub/Sub Docker Kubernetes GitHub Actions GitLab CI Cloud Build Prometheus Grafana Cloud Monitoring CloudWatch SageMaker Vertex AI MLflow
Benefits & Perks
Remote work (async-first environment)
Work with a leading client in the ad tech domain

Job Description


We're hiring an ML Engineer to work alongside Data Scientist(s) and support a leading client in the ad tech domain. You will own the infrastructure, low-latency APIs, data pipelines, deployment, and reliability of recommendation and ranking models in production. You'll be the bridge between data science and engineering: taking prototypes from the Data Scientist and turning them into robust, low-latency, high-availability services that operate at ad-tech scale. You should be comfortable with asynchronous communication(written updates, docs, Slack-style collaboration) with both the client and our internal team across time zones.

The Candidate Will Have Responsibilities Across The Following Functions

Model Productionization and Serving:

  • Design, build, and maintain low-latency APIs for serving recommendation and ranking models.
  • Take Data Scientist-built models (in Python) and productionize them for real-time or near-real-time serving.
  • Implement and maintain model serving endpoints (e. g., using SageMaker, Vertex AI, custom Docker/Kubernetes-based services, or similar).
  • Optimise for low latency and high throughput, suitable for ad-serving workloads.

Feature Pipelines And Data Engineering

  • Design and build feature pipelines for training and inference:
  • Batch pipelines using tools like Airflow, dbt, Beam, or Spark.
  • Streaming / real-time features using Kafka, Pub/Sub, etc
  • Design, integrate with, or operate an online feature store to serve low-latency features for real-time scoring.
  • Ensure training-serving skew is minimised; maintain clear contracts for feature definitions and data schemas.

Infrastructure And MLOps

  • Implement CI/CD for ML models and pipelines (e. g., GitHub Actions, GitLab CI, Cloud Build, etc. ).
  • Manage containerization and deployment using Docker and Kubernetes (or managed equivalents).
  • Set up and maintain model versioning, configuration management, and rollback strategies.

Monitoring, Observability And Reliability

  • Work with the Data Scientist to define metrics and implement monitoring for:
  • Model performance (prediction distribution, drift, business KPIs).
  • System performance (latency, error rates, resource utilisation).
  • Data quality (schema checks, nulls, outliers, volume anomalies).
  • Build alerting and logging using the client's stack (e. g., Prometheus, Grafana, Cloud Monitoring, CloudWatch, etc. ).
  • Investigate and resolve production issues, from infrastructure to data to model-related problems.

Experimentation Platform Support

  • Integrate models with the client's AB testing/experimentation framework.
  • Implement traffic splits, routing logic, and variant toggles (feature flags).
  • Ensure metrics and logs needed for experiment analysis are correctly captured and accessible.

Collaboration And Client Interaction

  • Work closely with the Data Scientist to understand modelling assumptions and requirements.
  • Collaborate with the client Product and Engineering teams to align on SLAs, integration points, and architectural choices.
  • Participate in technical discussions with client partners; communicate trade-offs and propose pragmatic solutions.
  • Provide clear async updates(tickets, comments, design docs, status summaries) so both the client and internal teams stay aligned without needing constant meetings.

Requirements

  • Experience: 2-5 years as an ML Engineer / Data Engineer / Software Engineer working on ML-heavy systems.
  • Programming: Strong skills in Python
  • Cloud: Hands-on experience with GCP or AWS
  • Data Engineering: Experience building and operating data pipelines (batch and/or streaming) using tools like Airflow, dbt, Beam, Spark, or similar.
  • MLOps / Infra: Experience with:
  • Containerization (Docker) and orchestration (Kubernetes or managed alternatives).
  • CI/CD for services or ML workflows.
  • Monitoring/logging tools (Prometheus, Grafana, CloudWatch, Stackdriver, etc. ).
  • Collaboration and Communication:
  • Comfortable working in a remote, async-first environment: writing good design docs, giving structured written updates, and collaborating over Slack/email/tickets with distributed teams.

Nice-to-Have

  • Experience with real-time / low-latency systems, especially in ad tech, recommendation, ranking, or search.
  • Familiarity with feature stores and online feature serving.
  • Familiarity with online experimentation frameworks and traffic routing for AB tests.
  • Familiarity with Model registries and ML platforms (e. g., MLflow, SageMaker, Vertex AI pipelines).
  • Comfort reading Data Scientist code/notebooks and refactoring them into clean, production-ready modules.

This job was posted by Akshay Singh from Yugen.ai.

Subscribe our newsletter

New Things Will Always Update Regularly