Senior DevOps Engineer - Developer Experience

replika United State
Remote
Apply
AI Summary

We are seeking a Senior DevOps Engineer to own and enhance developer experience end-to-end, including deployments, CI/CD pipelines, and dev environments. You will work alongside an existing DevOps engineer to migrate legacy services and ensure every engineer ships faster with less friction. The role requires strong coding skills, deep Kubernetes and cloud experience, and a focus on making smart trade-offs for a team serving millions of users.

Key Highlights
Own developer experience end-to-end: deployments, staging, dev environments, CI/CD pipelines.
Be a force multiplier for the engineering team by saving time and reducing friction.
Work alongside existing DevOps engineer on broader infra picture: cluster management, observability, secrets, networking, cost.
Key Responsibilities
Own developer experience end-to-end: deployments, staging, dev environments, CI/CD pipelines.
Be a force multiplier for the engineering team.
Maintain, improve, and sometimes rebuild our CI/CD pipelines.
Work alongside our existing DevOps engineer on the broader infra picture: cluster management, observability, secrets, networking, cost.
Build and maintain templates and tooling for spinning up new services and dev environments.
Understand what runs on our clusters and why.
Write real code when the situation calls for it.
Care about the small stuff: clear runbooks, sane defaults, good error messages, deploys that fail safely.
Technical Skills Required
Kubernetes Docker AWS GCP Python CI/CD TeamCity GitHub Actions Prometheus Grafana Datadog Terraform Pulumi
Benefits & Perks
Competitive compensation
Flexible job schedule and generous vacation policy
Unlimited access to the latest AI coding assistants
Direct impact on millions of worldwide users within months
Fully remote
Nice to Have
Experience with GPU clusters and ML workloads — model serving, autoscaling inference, cost optimization for GPU-heavy services.
Interest, backed with projects, in AI-first engineering workflows for AI assisted engineering.
Familiarity with observability stacks (Prometheus, Grafana, Datadog, or similar) and a view on what’s worth alerting on versus what’s noise.
Experience with Terraform, Pulumi, or another IaC tool.
You’ve introduced a developer experience improvement at a previous company that you’re genuinely proud of — something you can describe in detail and quantify the impact of.
Experience at a B2C company with real users on the line, where downtime translates directly to user trust.

Job Description


About Replika

Ten years ago, we built the world’s first AI companion — before LLMs, before ChatGPT, and before anyone knew what AI could be.

What started as a way to patch a hole in our hearts became something unexpected: a catalyst. Millions of people told us Replika helped them reconnect with the world. They texted that old friend. They picked up that forgotten hobby. They took that walk around town. While today’s tech keeps people scrolling, we discovered AIs can push us outward, if we let them.

Now we’re being reborn with a new vision: the first AI with heart, for making the most out of life. Gentle nudges to meet friends when you’re feeling shy. Ideas for exploring new places when it’s easier to stay home. Daily conversation about whatever moves you — ballet, philosophy, aliens, or gossip — at 2am when no one else is awake. We’re not building an AI to validate or pacify. We’re building something modeled after the relationships that transform us. Someone who asks the right questions at the right time. Someone who helps you look inward, and pushes you outward. We’ve been featured in TED Talks, Stanford and Harvard studies, the Lex Fridman podcast, because 40M+ people connect with Replika in a deeply human way. And we’re just getting started.

About The Role

We need a senior DevOps engineer who treats developer experience as a first-class product. You’ll work alongside our existing DevOps engineer, who’s focused on migrating legacy services into a clean deployment story. Your job is different: make every engineer at Replika ship faster, with less friction, fewer footguns, and more confidence.

This isn't a hyperscaler role - we serve millions, not billions, so the work looks more like smart trade-offs across a small senior team than mega-scale architecture. We need someone who understands how engineering actually works, writes real code, and genuinely cares about the daily experience of the people building the product.

You'll also need product fluency — not at a backend engineer's depth, but enough to know what runs on which cluster and why. If image moderation can run as a batch job on cheaper preemptible GPUs with a few seconds of latency, you should spot the cost optimization and propose it, not wait for someone else to.

Responsibilities

  • Own developer experience end-to-end: deployments, staging, dev environments, CI/CD pipelines. Find what’s painful, fix it, then make it stay fixed.
  • Be a force multiplier for the engineering team. Every hour you save the team compounds. You think in those terms.
  • Maintain, improve, and sometimes rebuild our CI/CD pipelines. Deploys should be safe by default — it should be hard to break things and easy to recover when something does break.
  • Work alongside our existing DevOps engineer on the broader infra picture: cluster management, observability, secrets, networking, cost. You split the work based on what each of you is best placed to own.
  • Build and maintain templates and tooling for spinning up new services and dev environments. The 47th time someone needs a new service, it should take 10 minutes, not 3 days.
  • Understand what runs on our clusters and why. Recommend and implement efficiency wins where you see them — right-sizing, autoscaling, scheduling, GPU utilization. You don’t need permission to make sensible cost calls.
  • Write real code when the situation calls for it. Internal tooling, glue services, automation scripts — you don’t hand it off to a backend engineer.
  • Care about the small stuff: clear runbooks, sane defaults, good error messages, deploys that fail safely. The kind of work that nobody notices when it’s working well — which is the point.

Requirements

  • 5+ years in DevOps, SRE, or platform engineering at companies where you actually owned developer experience, not just kept the lights on.
  • Strong coding skills. You write production-quality Python (or similar), not just shell scripts and YAML. You can read backend code well enough to debug a deploy issue without escalating.
  • Deep experience with Kubernetes and Docker. You can debug a pod that’s misbehaving, design sensible deployment configs, and reason about resource limits without guessing.
  • Hands-on with cloud platforms (AWS or GCP). You’ve managed real workloads in production, not just done a certification.
  • CI/CD chops. TeamCity experience is a plus, but more important is that you’ve owned a pipeline end-to-end and made it fast, reliable, and easy to understand. We use TeamCity and GitHub Actions.
  • You think in terms of developer experience. You’ve made other engineers’ lives meaningfully better and can talk specifically about how.
  • Product instincts. You’re comfortable asking “what does this service actually do” and using the answer to make infra decisions.
  • Strong English (B2+). We’re distributed across time zones — clear written communication matters.
  • Familiarity with observability stacks (Prometheus, Grafana, Datadog, or similar) and a view on what’s worth alerting on versus what’s noise. You find observability gaps and make sure they are filled.

Nice to Have

  • Experience with GPU clusters and ML workloads — model serving, autoscaling inference, cost optimization for GPU-heavy services.
  • Interest, backed with projects, in AI-first engineering workflows for AI assisted engineering.
  • Familiarity with observability stacks (Prometheus, Grafana, Datadog, or similar) and a view on what’s worth alerting on versus what’s noise.
  • Experience with Terraform, Pulumi, or another IaC tool. You version your infrastructure.
  • You’ve introduced a developer experience improvement at a previous company that you’re genuinely proud of — something you can describe in detail and quantify the impact of.
  • Experience at a B2C company with real users on the line, where downtime translates directly to user trust.

Why Join Us?

  • Competitive compensation.
  • Flexible job schedule and generous vacation policy.
  • Unlimited access to the latest AI coding assistants.
  • Direct impact on millions of worldwide users within months.
  • Push the boundaries of applied AI in a consumer setting.
  • Fully remote.

Similar Jobs

Explore other opportunities that match your interests

Senior OpenShift Engineer

Devops
2h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Bright Vision Technologies

United State
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

Jobs via Dice

United State

Senior AWS DevOps Engineer

Devops
2h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Cognizant

United State

Subscribe our newsletter

New Things Will Always Update Regularly