Machine Learning Systems Engineer

Aurora • United State

Visa Sponsorship

Apply

AI Summary

Build and optimize production infrastructure for diffusion-based language models, focusing on serving, compilation, and deployment reliability. Own end-to-end systems from model execution to cloud rollout across AWS and Azure. Requires deep technical ownership and cross-functional collaboration with researchers and engineering leadership.

Key Highlights

Production ML infrastructure ownership

Diffusion LLM serving and optimization

Cross-functional systems debugging

Kubernetes and cloud deployment pipelines

Key Responsibilities

Build and improve model serving infrastructure with attention to latency, throughput, and stability

Optimize performance across CUDA, TensorRT, ONNX Runtime, vLLM, and SGLang to reduce bottlenecks

Create reproducible deployment pipelines across Kubernetes and cloud environments with safe release and rollback mechanisms

Develop benchmarking and evaluation systems to separate real model gains from runtime noise

Debug failures across Python, containerized services, GPU execution, and orchestration layers

Scale infrastructure to support growing customer load and evolving model requirements

Collaborate directly with researchers to translate model changes into production-ready performance improvements

Technical Skills Required

PyTorch CUDA Kubernetes Python

Benefits & Perks

Base salary: $200K–$300K

Competitive equity

On-site work in Palo Alto, CA

Visa support available

Job Description

Machine Learning Systems Engineer

Palo Alto, CA · On-site · Full-time

$200K–$300K base + competitive equity

The company

The company is building diffusion-based language models that generate tokens in parallel instead of one at a time.

That architecture is designed to reduce latency and cost while preserving quality, and it has already moved beyond research.

The team launched the first commercially available dLLM, Mercury, in early 2025 and is now deploying large-scale diffusion LLMs at Fortune 500 companies.

The company has raised $56M, is about 20 people, and operates as a small, deeply technical team in Palo Alto.

This is not a lab demo. The product is already being used in enterprise settings, which makes production performance, deployment reliability, and systems quality first-order problems.

The role

This is a machine learning systems role for someone who wants ownership of the infrastructure around model performance: serving, compilation, optimization, benchmarking, deployment, and reliability.

You will work directly with researchers and engineering leadership to move models from implementation to production systems that are measurable, reproducible, and fast enough for real customers.

The scope is broad enough to feel staff-level in practice. The hard problems are the ones that decide whether the model is usable in the real world: throughput, latency, memory use, hardware efficiency, rollout safety, and operational stability.

The technical problem

Searching for Machine Learning & AI roles that provide visa sponsorship? Connect with international employers through Machine Learning & AI Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

Diffusion LLMs change the inference problem.

You are not just serving a model. You are making a new architecture run efficiently across GPUs, runtimes, and cloud environments while preserving output quality and deployment reliability.

The challenge is to connect research code to production systems without losing the performance characteristics that make the architecture valuable in the first place.

That means the work spans model execution, runtime optimization, infrastructure, and evaluation rather than only training or only serving.

What you'll own

• Model serving infrastructure: build and improve the systems that serve diffusion LLMs in production, with attention to latency, throughput, and stability.

• Performance optimization: work across CUDA, TensorRT, ONNX Runtime, vLLM, and SGLang to reduce bottlenecks and improve hardware utilization.

• Deployment pipelines: make model rollout reproducible across Kubernetes and cloud environments, with safe release and rollback mechanisms.

• Benchmarking and evaluation: build measurement systems that separate real model gains from runtime noise and infrastructure effects.

• Systems debugging: trace failures across Python, containerized services, GPU execution, and orchestration layers.

• Scaling infrastructure: help adapt the stack as customer load grows and model requirements evolve across AWS and Azure.

• Cross-functional execution: work closely with researchers to turn model changes into production-ready performance improvements.

Who this is for

You are likely a strong fit if you have:

• Built production ML infrastructure or inference systems where latency, throughput, and cost are explicit design constraints.

• Strong judgment around GPU utilization, memory pressure, batching, and runtime tradeoffs.

• Experience with PyTorch, CUDA, serving runtimes, or deployment stacks that sit between model code and production traffic.

• Comfort reading profiles, tracing bottlenecks, and turning ambiguous performance issues into concrete fixes.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

• Shipped systems where correctness, reproducibility, and operational reliability mattered as much as raw speed.

• The ability to work directly with researchers and translate model behavior into systems decisions.

• Experience operating in environments where requirements change as quickly as the model stack.

• Enough range to move from code-level debugging to infrastructure design without handoff overhead.

Tech stack

• Serving and optimization: vLLM, TensorRT, ONNX Runtime, SGLang

• Modeling and training: PyTorch, TensorFlow

• GPU and systems: CUDA, Docker, Kubernetes

• Infrastructure: Python, AWS, Azure, Kubeflow

The stack is broad because the work sits across research, inference, deployment, and cloud infrastructure. The best candidates will understand where each layer creates leverage and where it becomes a bottleneck.

Why now

The company has already proven the core idea with a commercial product and enterprise deployments.

The next problem is not whether the model works in principle. It is whether the system can serve real demand with predictable performance, stable rollouts, and a runtime stack that keeps up with model progress.

This is the point where systems engineering matters most: the architecture decisions made now will shape how efficiently the product can scale across customers and hardware generations.

This role is not for you if

• You want a narrowly scoped feature role with clean handoffs.

• You prefer working only on model research and do not want systems ownership.

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

• You are uncomfortable debugging across GPU, runtime, container, and orchestration layers.

• You do not want to work on-site most days in Palo Alto.

• You need strict process separation between research, infrastructure, and product execution.

Compensation and logistics

• Base salary: $200K–$300K

• Equity: competitive

• Location: Palo Alto, CA

• Work model: on-site, 5 days per week in Palo Alto

• Visa support: available

• Employment: full-time

Interview process

Typical process:

• Intro call — 20 min: background, scope, and fit.

• Technical coding rounds: engineering depth and problem-solving.

• Onsite-style panel with founders: usually remote.

• References: final stage.

About Aurora

Aurora helps exceptional engineers find the right role at some of the most ambitious startups worldwide.

We work with teams that expect high ownership, technical depth, and direct accountability.

Job Overview

Posted Date Jun 30, 2026

Employment Type Full-time

Experience Level Entry level

Location United State

Annual Salary 200,000 - 300,000 USD

Category Machine Learning

Company Aurora

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Founding AI/ML Engineer

Machine Learning

•

12h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

clera

United State

Senior Machine Learning Platform Engineer

Machine Learning

•

2d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Chime

United State

Sr Distinguished Machine Learning Engineer

Machine Learning

•

3d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Capital One

United State

Machine Learning Systems Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Founding AI/ML Engineer

clera

Senior Machine Learning Platform Engineer

Premium Job

Chime

Sr Distinguished Machine Learning Engineer

Premium Job

Capital One

Subscribe our newsletter