Senior MLOps Engineer (GCP Focus)

oakwell hampton group • Portugal
Remote
This Job is No Longer Active This position is no longer accepting applications
AI Summary

Design and implement best-in-class MLOps pipelines on GCP, ensuring seamless and efficient model deployment, and collaboration with data scientists and engineers.

Key Highlights
Design and evolve existing pipelines for machine learning lifecycle
Implement robust CI/CD/CT pipelines for ML models
Create a standardized and efficient path to production using GCP ecosystem
Develop a best-in-class observability framework for ML models
Collaborate with data scientists and engineers to accelerate workflow
Technical Skills Required
GCP Vertex AI Pipelines Vertex AI Endpoints Vertex AI Model Registry BigQuery Pub/Sub GKE Prometheus Grafana ELK stack Terraform Ansible Docker Kubernetes Python Infrastructure as Code
Benefits & Perks
Remote work
6-12 month contract
Flexible start date

Job Description


Senior MLOps Engineer

Fully Remote (Spain or Portugal)

Start Date ASAP

6-12 month contract

GCP Focus

Strong focus on Personalisation/Recommendation


Job Description:

  • You will take existing pipelines and evolve them to be best-in-class, responsible for operationalising new models (like NBA, ranking, and LLM-based solutions) with agility and efficiency. Your primary goal is to create a seamless, reliable, and highly observable environment on GCP that empowers our Data Scientists and ML Engineers to iterate and deploy models faster. You will be expected to have created or significantly evolved MLOps frameworks in the past and be able to quantify the improvements you deliver (e.g., in deployment frequency, model performance monitoring, or system reliability).


What You'll Do:

  • Take ownership of and evolve our end-to-end ML lifecycle, from data ingestion and feature engineering pipelines to model training, deployment, and real-time serving.
  • Design, build, and manage robust, automated CI/CD/CT (Continuous Integration / Continuous Delivery / Continuous Training) pipelines specifically for ML models, integrating with existing CI/CD patterns.
  • Leverage the GCP ecosystem, especially Vertex AI Pipelines, Vertex AI Endpoints, and Vertex AI Model Registry, to create a standardised and efficient path to production.
  • Design and own a best-in-class observability framework for ML models in production. This includes implementing granular monitoring for model performance (accuracy, bias), data and concept drift, and operational health (latency, throughput, error rates).
  • Collaborate closely with Data Scientists and ML Engineers to understand their needs, building the tools and abstractions that create a seamless environment and accelerate their workflow.
  • Optimise ML serving infrastructure for low-latency, real-time personalisation requirements.
  • Partner with data engineering to ensure robust integration with feature stores and data sources (like BigQuery and Oracle).
  • Define and track key MLOps metrics to quantify and communicate improvements in system performance, model quality, and team velocity.


What We're Looking For

  • 7+ years of deep, hands-on experience in a dedicated MLOps or DevOps role with a strong focus on machine learning systems.
  • Proven experience building or evolving MLOps frameworks from the ground up, with clear examples of the improvements you delivered.
  • Expert-level knowledge of the GCP cloud stack, particularly Vertex AI (Pipelines, Endpoints, Training), BigQuery, Pub/Sub, and GKE.
  • Deep expertise in building and managing observability stacks for real-time ML systems (e.g., using tools like Prometheus, Grafana, ELK stack, or specialised platforms).
  • Proven experience operationalising LLM-based systems, including managing embedding generation pipelines, vector databases, and fine-tuning/deployment workflows.
  • Strong practical experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible).
  • Demonstrable expertise in building and managing complex CI/CD pipelines.
  • Proficiency in Python and experience with scripting for automation, infrastructure management, and building tooling for ML teams.

Strong understanding of containerisation (Docker, Kubernetes) and microservices architecture as it applies to ML model serving


Subscribe our newsletter

New Things Will Always Update Regularly