Senior ML Inference Optimization Engineer

coffeespace • San Francisco Bay Area

Visa Sponsorship Relocation

Apply

AI Summary

Optimize large multimodal models for real-time video conversation. Collaborate with founders and researchers to ship the company's first major product. Work on frontier AI involving video, speech, and language.

Key Highlights

Optimize models for inference latency, throughput, and cost efficiency

Collaborate with founders and researchers to ship the first major product

Work on frontier multimodal AI involving video, speech, and language

Key Responsibilities

Profile and optimize large multimodal models for inference latency, throughput, and cost efficiency

Identify bottlenecks using tools such as NSight, Torch Profiler, CUDA debugging tools, and production observability systems

Apply acceleration techniques including quantization, pruning, distillation, TensorRT, ONNX, Triton, and vLLM

Build and maintain infrastructure that supports researchers from experimentation through deployment

Develop evaluation frameworks that measure performance, quality, and operational reliability

Collaborate with research teams on model architecture decisions that impact production performance

Technical Skills Required

PyTorch CUDA GPU systems Inference optimization Quantization Pruning Distillation TensorRT ONNX Triton vLLM

Benefits & Perks

$250K-$450K base + equity

Visa sponsorship available

Relocation support available

Job Description

Job Title: ML Inference Optimization Engineer

Salary: $250K-$450K base + equity

Location: Seattle, WA (5 days in-office)

Visa sponsorship available

Relocation support available

Company Description

Venture-backed AI startup building real-time visual conversational AI.

The team is tackling one of the hardest problems in modern AI: creating systems that can understand and respond to human emotion, behavior, and conversation in real time through video.

Backed by over $60M in funding, the company is moving from research to production and assembling a small team of engineers who can bridge cutting-edge models with real-world deployment.

Job Description

Join a team of researchers and engineers building multimodal AI systems capable of real-time video conversation.

This role sits at the intersection of machine learning, systems engineering, and performance optimization. You'll own the infrastructure and tooling that allows large multimodal models to run efficiently in production, working across video diffusion models, LLMs, speech systems, and future foundation models.

This is not a model training role.

The focus is on making state-of-the-art models faster, cheaper, and production-ready through deep profiling, inference optimization, and GPU systems work.

Looking to advance your Machine Learning & AI career with relocation support? Explore Machine Learning & AI Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

You'll work directly with founders, researchers, and infrastructure engineers to help ship the company's first major product release.

Why this role is remarkable

Work on frontier multimodal AI involving video, speech, and language
Optimize systems where milliseconds directly impact user experience
Join a highly technical 14-person team backed by over $60M in funding
Own performance across the full stack from model architecture to deployment
Influence core technical decisions in a flat, founder-led organization

What you will do

Profile and optimize large multimodal models for inference latency, throughput, and cost efficiency
Identify bottlenecks using tools such as NSight, Torch Profiler, CUDA debugging tools, and production observability systems
Apply acceleration techniques including quantization, pruning, distillation, TensorRT, ONNX, Triton, and vLLM
Build and maintain infrastructure that supports researchers from experimentation through deployment
Develop evaluation frameworks that measure performance, quality, and operational reliability
Collaborate with research teams on model architecture decisions that impact production performance

The ideal candidate

2+ years of hands-on ML engineering experience working on production systems

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Strong experience profiling and optimizing LLMs, diffusion models, or other large neural networks
Deep familiarity with PyTorch, CUDA, GPU systems, and modern inference tooling
Experience deploying and operating ML systems at scale rather than exclusively training models
Startup experience preferred, though exceptional candidates from larger companies are welcome
Strong ownership mentality and ability to move between systems, infrastructure, and modeling challenges

Less likely to be a fit

Candidates whose primary ML experience is fine-tuning models
Research-focused profiles without production ownership
Recent management or director-level candidates no longer working hands-on
Generalists without demonstrated depth in inference optimization or GPU systems

Next steps

Apply through this LinkedIn posting.
If there is a strong fit, we'll reach out directly with additional information and introductions to the hiring team.
If this specific role isn't the best match, we may also suggest other high-signal startup opportunities aligned with your background and interests.

A quick note on authenticity

This is a real, active role that we are recruiting for in close partnership with the hiring team. We work directly with founders on their hiring needs and only represent active opportunities.

Job Overview

Posted Date Jun 12, 2026

Employment Type Full-time

Experience Level Entry level

Location San Francisco Bay Area

Annual Salary 250,000 - 450,000 USD

Category Machine Learning

Company coffeespace

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Senior Deep Learning Researcher

Machine Learning

•

1d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

river ai

San Francisco Bay Area

Founding Machine Learning Engineer

Machine Learning

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Stealth Startup

San Francisco Bay Area

Founding Senior Machine Learning Engineer (Voice AI)

Machine Learning

•

6d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

kadence

San Francisco Bay Area

Senior ML Inference Optimization Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Senior Deep Learning Researcher

Premium Job

river ai

Founding Machine Learning Engineer

Stealth Startup

Founding Senior Machine Learning Engineer (Voice AI)

kadence

Subscribe our newsletter