Platform Engineer

hud • United State

Visa Sponsorship Relocation

Apply

AI Summary

We're looking for a platform engineer to own the reliability, scale, performance, and developer experience of HUD's core infrastructure and backend systems. This role requires strong production infra experience, backend engineering judgment, and the ability to reason about service architecture, APIs, databases, and async systems. The ideal candidate will have experience with AWS infrastructure, containerized systems, and CI/CD, environment management, release automation, observability, alerting, and incident response systems.

Key Highlights

Own production uptime, latency, provisioning speed, infrastructure cost, and incident response for core platform services

Design and improve backend and platform systems for scale, including capacity planning, autoscaling, queueing, backpressure, cleanup jobs, retries, and rollback paths

Build reliable CI/CD, release automation, environment management, and deployment workflows that improve developer productivity and reduce production risk

Key Responsibilities

Own production uptime, latency, provisioning speed, infrastructure cost, and incident response for core platform services

Design and improve backend and platform systems for scale, including capacity planning, autoscaling, queueing, backpressure, cleanup jobs, retries, and rollback paths

Build reliable CI/CD, release automation, environment management, and deployment workflows that improve developer productivity and reduce production risk

Technical Skills Required

Amazon Web Services Kubernetes Terraform

Benefits & Perks

Competitive compensation based on experience and location

100% covered top-of-the-line medical, dental, and vision from Blue Shield of CA

Lunch and dinner when you’re in the office

Company-wide holiday break (Christmas Eve to New Year’s Day) on top of PTO and paid holidays

Unlimited* access to tokens for ChatGPT, Claude Code, Cursor, etc.

Nice to Have

Experience operating infrastructure for data-heavy, ML/AI, workflow, marketplace, developer-tools, or enterprise platforms

Experience designing systems for bursty workloads, long-running jobs, sandboxed execution, distributed workers, or high-concurrency services

Experience reducing cloud spend through better architecture, autoscaling, workload placement, caching, cleanup systems, or observability

Job Description

About HUD

HUD is building infrastructure to create RL training data and evals for frontier AI agents, as well as a marketplace to sell these to frontier labs through the HUD marketplace. Our platform is used by frontier labs, Fortune 500 companies, and startups. We’ve raised $16M from top VCs and were YC W25.

About The Role

We’re looking for a platform engineer who can own the reliability, scale, performance, and developer experience of HUD’s core infrastructure and backend systems.

This is not a pure infrastructure role. The right person has strong production infra experience, but also thinks like a backend engineer: they can reason about service architecture, queues, databases, APIs, deployment safety, performance bottlenecks, and how product requirements translate into resilient systems. You’ll work across AWS, Kubernetes, Terraform, CI/CD, observability, and backend services to make HUD faster, more reliable, cheaper to run, and easier for engineers to build on.

Responsibilities

Own production uptime, latency, provisioning speed, infrastructure cost, and incident response for core platform services
Build and maintain AWS infrastructure with Terraform, Kubernetes/EKS, Helm, Docker, EC2, CodeBuild, ECR, S3, IAM, networking, and secrets management
Design and improve backend and platform systems for scale, including capacity planning, autoscaling, queueing, backpressure, cleanup jobs, retries, and rollback paths
Define and improve dashboards, alerts, logs, traces, SLOs, runbooks, and on-call workflows so failures are detected, debugged, and resolved quickly
Build reliable CI/CD, release automation, environment management, and deployment workflows that improve developer productivity and reduce production risk
Write clean, maintainable code where needed to automate systems, improve backend services, and create internal tooling

Looking to advance your Devops career with relocation support? Explore Devops Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

Experience

You may be a good fit if you:

Have owned production cloud infrastructure for a high-availability, user-facing platform, with responsibility for uptime, performance, deployment safety, and cost
Have deep experience with AWS infrastructure and containerized systems; experience with tools like Terraform, Kubernetes/EKS, Docker, EC2, CodeBuild, ECR, S3, IAM, load balancers, networking, and secrets management is strongly preferred
Have built or operated CI/CD, environment management, release automation, observability, alerting, and incident response systems
Have strong backend engineering judgment and can reason about service architecture, APIs, databases, async systems, queues, scaling limits, and production failure modes
Can write clean, maintainable code and apply strong software engineering judgment across product architecture, infrastructure, backend systems, and developer workflows

Strong Candidates May Also Have

Experience operating infrastructure for data-heavy, ML/AI, workflow, marketplace, developer-tools, or enterprise platforms
Experience designing systems for bursty workloads, long-running jobs, sandboxed execution, distributed workers, or high-concurrency services
Experience reducing cloud spend through better architecture, autoscaling, workload placement, caching, cleanup systems, or observability

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Experience building internal platforms or tools that make engineers faster without hiding too much complexity

We prioritize technical aptitude, ownership, and learning potential over years of experience.

Team & Company Details

Team Size : ~15 people currently, mostly full-time in-person, but some remote.
Our team: Our team includes 4 International Olympiad medalists (IOI, ILO, IPhO), serial AI startup founders, and researchers with publications at ICLR, NeurIPS, etc.
Company stage: We have 8 figures in funding and high revenue growth. We’re scaling profitably and quickly to meet very strong demand.

Logistics

Employment : Full-time.
Location : On-site in the San Francisco Bay Area.

Interested in relocating to United State? Check out our comprehensive Relocation Jobs in United State page with detailed relocation packages and benefits.

Visa Sponsorship : We provide support for relocation and visas for strong full-time candidates to the US.
Timeline : Applications are rolling. The process is 2 technical interviews and a 1-week work trial.

What We Offer

Competitive compensation based on experience and location
100% covered top-of-the-line medical, dental, and vision from Blue Shield of CA
Lunch and dinner when you’re in the office
Company-wide holiday break (Christmas Eve to New Year’s Day) on top of PTO and paid holidays
Other perks including an Equinox membership, 401k, and commuter benefits
Unlimited* access to tokens for ChatGPT, Claude Code, Cursor, etc. *By unlimited, we mean no one on our token usage leaderboard has ever hit a limit. So we have no idea what the limit is.

Due to high volume, we may not actively respond to every application, but feel free to contact us at [[email removed]](mailto:[email removed]) or elsewhere if we missed your application!

Job Overview

Posted Date Jul 01, 2026

Employment Type Full-time

Experience Level Not Applicable

Location United State

Annual Salary 138,550 USD

Category Devops

Company hud

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

DevOps Engineer

Devops

•

2h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

sessioncam

United State

QA/QC Manager - Industrial Construction

Devops

•

2h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

novax recruitment group

United State

DevOps Engineer - Cloud Infrastructure & Automation

Devops

•

11h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

fetchjobs.co

United State

Platform Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

DevOps Engineer

Premium Job

sessioncam

QA/QC Manager - Industrial Construction

novax recruitment group

DevOps Engineer - Cloud Infrastructure & Automation

Premium Job

fetchjobs.co

Subscribe our newsletter