Kubernetes Infrastructure Engineer

THRYVE Germany
Remote
Apply
AI Summary

We are seeking a skilled Kubernetes Infrastructure Engineer to join our team. The successful candidate will be responsible for running and operating a large-scale AI platform used in complex customer environments. This is a challenging role that requires strong troubleshooting and analytical skills, as well as experience with Kubernetes and distributed systems.

Key Highlights
Running and operating a large-scale AI platform
Troubleshooting under pressure
Improving deployment processes
Key Responsibilities
Troubleshooting under pressure
Improving deployment processes
Working directly with customer-side infrastructure teams
Owning the operational reality of a production AI platform end-to-end
Technical Skills Required
Kubernetes Distributed systems CI/CD tooling (Jenkins, GitHub Actions, Ansible, ArgoCD) Monitoring and observability tooling (Grafana, Loki, Prometheus, OpenTelemetry, Dynatrace, Instana) Redis Postgres MariaDB Kafka Elastic Minio
Benefits & Perks
Remote work (up to 180 days per year)
Flexible work arrangement

Job Description


Infrastructure Engineer — Kubernetes / Distributed Systems / AI Platform

Remote anywhere in Germany | HQ in NRW | Work from anywhere for up to 180 days per year


This is not a “keep the lights on” DevOps role...


You’ll be part of the team responsible for running and operating a large-scale AI platform used in complex customer environments — including highly customised on-premise infrastructure deployments.


The challenge here isn’t just Kubernetes.

It’s making a highly distributed, containerised system reliably run in environments you don’t fully control.


That means troubleshooting under pressure, improving deployment processes, working directly with customer-side infrastructure teams, and owning the operational reality of a production AI platform end-to-end.


You’ll be working on systems running more than 1,000 containers in production across a large microservice architecture, helping improve everything from CI/CD pipelines and observability to runtime stability and deployment reliability.


This role is heavily focused on runtime operations, incident handling, and delivery infrastructure — not feature development.


The Engineering Muscle You Bring

  • Experience with Kubernetes or OpenShift
  • Strong understanding of distributed systems and microservice architectures
  • Experience with CI/CD tooling such as Jenkins, GitHub Actions, Ansible, or ArgoCD
  • Experience with monitoring and observability tooling such as Grafana, Loki, Prometheus, OpenTelemetry, Dynatrace, or Instana
  • Knowledge of technologies like Redis, Postgres, MariaDB, Kafka, Elastic, or Minio
  • Strong troubleshooting and analytical skills
  • Hands-on engineering mindset with a strong sense of ownership
  • Fluent German and English communication skills


Why This Role Appeals to People Who Like Complexity

  • Large-scale production systems with 1,000+ containers running live
  • Complex Kubernetes and OpenShift environments
  • Real operational ownership instead of pure maintenance work
  • Challenging on-premise customer deployments
  • Exposure to modern AI platforms and distributed architectures
  • High-impact work with lots of technical depth and learning potential


Khalifa@thryvetalent.com


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

THRYVE

Germany

Junior DevOps Engineer

Devops
13h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Entry level

huskycare

Germany
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

apriori - business solutions a...

Germany

Subscribe our newsletter

New Things Will Always Update Regularly