Senior ML Inference Optimization Engineer
Optimize large multimodal models for real-time video conversation. Collaborate with founders and researchers to ship the company's first major product. Work on frontier AI involving video, speech, and language.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
Job Title: ML Inference Optimization Engineer
Salary: $250K-$450K base + equity
Location: Seattle, WA (5 days in-office)
Visa sponsorship available
Relocation support available
Company Description
Venture-backed AI startup building real-time visual conversational AI.
The team is tackling one of the hardest problems in modern AI: creating systems that can understand and respond to human emotion, behavior, and conversation in real time through video.
Backed by over $60M in funding, the company is moving from research to production and assembling a small team of engineers who can bridge cutting-edge models with real-world deployment.
Job Description
Join a team of researchers and engineers building multimodal AI systems capable of real-time video conversation.
This role sits at the intersection of machine learning, systems engineering, and performance optimization. You'll own the infrastructure and tooling that allows large multimodal models to run efficiently in production, working across video diffusion models, LLMs, speech systems, and future foundation models.
This is not a model training role.
The focus is on making state-of-the-art models faster, cheaper, and production-ready through deep profiling, inference optimization, and GPU systems work.
Looking to advance your Machine Learning & AI career with relocation support? Explore Machine Learning & AI Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.
You'll work directly with founders, researchers, and infrastructure engineers to help ship the company's first major product release.
Why this role is remarkable
- Work on frontier multimodal AI involving video, speech, and language
- Optimize systems where milliseconds directly impact user experience
- Join a highly technical 14-person team backed by over $60M in funding
- Own performance across the full stack from model architecture to deployment
- Influence core technical decisions in a flat, founder-led organization
What you will do
- Profile and optimize large multimodal models for inference latency, throughput, and cost efficiency
- Identify bottlenecks using tools such as NSight, Torch Profiler, CUDA debugging tools, and production observability systems
- Apply acceleration techniques including quantization, pruning, distillation, TensorRT, ONNX, Triton, and vLLM
- Build and maintain infrastructure that supports researchers from experimentation through deployment
- Develop evaluation frameworks that measure performance, quality, and operational reliability
- Collaborate with research teams on model architecture decisions that impact production performance
The ideal candidate
- 2+ years of hands-on ML engineering experience working on production systems
- Strong experience profiling and optimizing LLMs, diffusion models, or other large neural networks
- Deep familiarity with PyTorch, CUDA, GPU systems, and modern inference tooling
- Experience deploying and operating ML systems at scale rather than exclusively training models
- Startup experience preferred, though exceptional candidates from larger companies are welcome
- Strong ownership mentality and ability to move between systems, infrastructure, and modeling challenges
Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.
Less likely to be a fit
- Candidates whose primary ML experience is fine-tuning models
- Research-focused profiles without production ownership
- Recent management or director-level candidates no longer working hands-on
- Generalists without demonstrated depth in inference optimization or GPU systems
Next steps
- Apply through this LinkedIn posting.
- If there is a strong fit, we'll reach out directly with additional information and introductions to the hiring team.
- If this specific role isn't the best match, we may also suggest other high-signal startup opportunities aligned with your background and interests.
A quick note on authenticity
This is a real, active role that we are recruiting for in close partnership with the hiring team. We work directly with founders on their hiring needs and only represent active opportunities.
Similar Jobs
Explore other opportunities that match your interests
Senior Deep Learning Researcher
river ai
Stealth Startup