Senior AI/ML Platform Engineer

team.blue • Germany

Visa Sponsorship

This Job is No Longer Active This position is no longer accepting applications

AI Summary

Design, build, and maintain machine learning and AI infrastructure platform. Create robust, scalable platforms for ML model deployment and inference. Collaborate with data science teams to optimize deployment workflows.

Key Highlights

Design and implement scalable ML/AI platforms

Manage and optimize GPU cluster resources

Implement monitoring, logging, and alerting systems

Collaborate with data science teams to optimize deployment workflows

Technical Skills Required

Python Docker Kubernetes Terraform Cloud Platforms (AWS, Azure, GCP) GPU-enabled services PyTorch TensorFlow TorchServe TensorFlow Serving CUDA multi-GPU distributed computing

Benefits & Perks

Right to Work in the country of application

Diversity & Inclusion

Respect, openness, and trusted collaboration

Job Description

Company

team.blue is an ecosystem of 60+ successful brands working together across 22 European countries to provide its 3.5 million SMB customers with everything they need to succeed online by offering best-in-class expertise and services.

team.blue's brands are a mix of traditional hosting businesses that offer services from domain names, email, shared hosting, e-commerce, and server hosting solutions and, as specialist SaaS providers, adjacent products such as compliance, marketing tools, and team collaboration products. This broad product offering makes it a one-stop partner for online businesses and entrepreneurs across Europe.

Position

We are looking for an experienced Senior AI/ML Platform Engineer to design, build, and maintain our machine learning and AI infrastructure platform. This role is critical to enabling our data science and AI teams to deploy, scale, and manage ML models efficiently across multi-GPU environments. You'll be responsible for creating robust, scalable platforms that support the full ML lifecycle from model training to inference, with a particular focus on LLM deployment and management.

Key Responsibilities

Platform Development & Management

Design and implement scalable ML/AI platforms supporting model deployment across multi-GPU nodes
Build and maintain infrastructure for LLM inference serving, including optimization for latency and throughput
Develop automated deployment pipelines for machine learning models using containerization and orchestration technologies
Create self-service tools and APIs that enable data scientists to deploy models independently

Infrastructure & Operations

Manage and optimize GPU cluster resources, ensuring efficient utilization and cost management
Implement monitoring, logging, and alerting systems for ML workloads and model performance
Design disaster recovery and backup strategies for critical ML infrastructure
Maintain high availability and reliability standards for production ML services

DevOps & Automation

Build CI/CD pipelines specifically tailored for ML model deployment and updates
Automate infrastructure provisioning using Infrastructure as Code (IaC) principles
Implement model versioning, rollback capabilities, and A/B testing frameworks
Develop automated scaling solutions for varying inference workloads

Collaboration & Support

Work closely with data science teams to understand requirements and optimize deployment workflows
Provide technical guidance on best practices for model deployment and infrastructure usage
Collaborate with security teams to implement secure ML model serving practices
Document platform capabilities, procedures, and troubleshooting guides

Profile

Professional Experience

4+ years of experience in Platform engineering, DevOps, or infrastructure roles
2+ years of experience specifically with ML/AI infrastructure or platforms

Technical Skills

Cloud Platforms: 4+ years experience with AWS, Azure, or GCP, particularly GPU-enabled services
Containerization: Proficiency with Docker and Kubernetes, including GPU scheduling and resource management
Infrastructure as Code: Experience with Terraform, CloudFormation, or similar tools
Programming: Strong skills in Python and at least one additional language (Go, Java, or Rust)
ML Frameworks: Familiarity with PyTorch, TensorFlow, and model serving frameworks (TorchServe, TensorFlow Serving, etc.)

Platform & Operations Experience

Experience building and maintaining production ML platforms or similar infrastructure (KubeFlow, MLFlow, SageMaker, etc)
Knowledge of GPU computing, CUDA, and multi-GPU distributed computing
Understanding of ML model lifecycle management and MLOps practices
Experience with monitoring tools (Prometheus, Grafana, ELK stack)
Experience with streaming data processing (Kafka, Kinesis, Pulsar)
Familiarity with service mesh technologies and API gateways

AI/ML Knowledge

Understanding of large language models (LLMs) and inference optimization techniques
Knowledge of model quantization, pruning, and other optimization methods
Experience with distributed training and inference across multiple GPUs/nodes
Familiarity with vector databases and embedding storage solutions

Right to Work

At any stage, please be prepared to provide proof of eligibility to work in the country you’re applying for. Unfortunately, we are unable to support relocation packages or sponsorship visas.

ESG

“At team.blue, our commitment to caring for the environment and each other is at the heart of everything we do. Our latest impact report showcases our ongoing ESG efforts and ambitious sustainability goals. Interested in learning more about our dedication to making a positive impact? Check it out here.”

"Come as you are"

Everyone is welcome here. Diversity & Inclusion are at our core. Far above any technical competence, we value respect, openness, and trusted collaboration. We do not tolerate intolerance.

The most trusted digital enabler

team.blue is a leading digital enabler for companies and entrepreneurs. It serves over 3.3 million customers in Europe and has more than 3,000 experts to support them. Its goal is to shape technology and to empower businesses with innovative digital services.

Click here to read more about team.blue

Job Overview

Posted Date Nov 07, 2025

Employment Type Full-time

Experience Level Mid-Senior level

Location Germany

Category Machine Learning

Company team.blue

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Full-Stack AI Engineer

Machine Learning

•

6h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

techmunity | ai startup recrui...

United Kingdom

Senior AI Agent Engineer - Coding Agent Runtime

Machine Learning

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

Aurora

United State

AI/ML Architect

Machine Learning

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

mintlayer

United Arab Emirates

Senior AI/ML Platform Engineer

Key Highlights

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Full-Stack AI Engineer

techmunity | ai startup recrui...

Senior AI Agent Engineer - Coding Agent Runtime

Aurora

AI/ML Architect

mintlayer

Subscribe our newsletter