Senior AI Agent Engineer - Coding Agent Runtime

Aurora • United State

Remote Visa Sponsorship

Apply

AI Summary

Own the core agent runtime that turns user intent into interactive mini-apps. Design and implement end-to-end orchestration, evaluation, and reliability systems for production AI workflows. Requires strong technical ownership, debugging skills, and experience shipping coding agents at scale.

Key Highlights

Owns agent runtime and orchestration for multi-step workflows

Builds evaluation frameworks and observability for quality measurement

Makes architecture decisions balancing quality, latency, and cost

Requires production experience with coding agents and long-horizon systems

Key Responsibilities

Own the agent runtime and orchestration layer coordinating planning, tool use, generation, validation, repair, and publishing

Design control flow for multi-step tasks with model failure recovery

Build evaluation harnesses, regression suites, failure taxonomies, and release gates

Develop model strategy for routing, retrying, benchmarking, and swapping models

Implement observability and debugging tools with logs, metrics, alerts, and replayable paths

Improve reliability by reducing failure rates and building graceful degradation systems

Enhance product quality at scale for millions of users

Technical Skills Required

Agentic workflows Evaluation systems Python Observability

Benefits & Perks

Full-time employment

Remote work

Competitive equity

H1B visa sponsorship

Job Description

Senior AI Agent Engineer — Coding Agent Runtime

Remote (US) · Pacific or Central Time Zone · Full-time

$150K–$250K base + competitive equity

The company

The company is building a consumer social platform around interactive mini-apps: users browse a feed of playable experiences and can create their own by describing what they want.

The creation flow is AI-native. It turns natural language into shareable, interactive content that can be used immediately and published.

The product already has real scale, with 1M+ monthly active users.

The company is backed by a16z, Mayfield, and Khosla, and has raised $30M.

The team is small, around 30 people, and the founder/operator background comes from consumer social at ByteDance.

The role

This is the engineer who owns the core engine behind creation: the agent runtime that turns intent into working mini-apps.

You will set the technical bar for the system end to end: architecture, orchestration, evaluation, reliability, and model strategy.

This is a greenfield-scope role. The team built its own agent framework from scratch, so you are shaping the platform rather than inheriting a large legacy system.

Searching for Machine Learning & AI roles that provide visa sponsorship? Connect with international employers through Machine Learning & AI Jobs with Visa Sponsorship opportunities actively seeking talented professionals.

The output is user-facing product quality, not a demo. The bar is whether the system can repeatedly produce correct, shareable experiences under real consumer usage.

The technical problem

The hard part is not getting a model to generate code once.

The hard part is building a system that can reliably handle ambiguous user intent, long-horizon reasoning, execution failures, validation loops, and publication, while staying debuggable, measurable, and cost-aware.

It also has to know when to trust the model and when to fall back to deterministic validation or rerouting.

The core workflow looks like this: prompt → plan → generate → run/validate → repair → publish.

That workflow has to work across messy edge cases: incomplete specs, failing builds, bad intermediate outputs, model regressions, and user-level quality problems that are hard to diagnose without strong tracing and evaluation infrastructure.

What you'll own

• Agent runtime and orchestration: own the execution layer that coordinates planning, tool use, generation, validation, repair, and publishing.

• Long-horizon workflows: design the control flow for multi-step tasks where the model has to recover from failure and still produce a usable result.

• Evaluation and quality loops: build eval harnesses, regression suites, failure taxonomies, and release gates that make quality measurable.

• Model strategy: choose when to route, retry, benchmark, or swap models based on reliability, task type, latency, and cost.

• Observability and debugging: make agent behavior traceable with logs, metrics, alerts, and replayable debugging paths.

• Reliability improvements: reduce failure rates, tighten feedback loops, and build systems that degrade gracefully when models or dependencies fail.

• Product quality at scale: improve the creation experience for a user base already in the millions, where small quality regressions become visible quickly.

Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.

Who this is for

You are likely a strong fit if you have:

• Owned production AI or ML systems end to end, not just a component of one.

• Built or operated coding agents, agentic workflows, or similar long-horizon systems in production.

• Strong judgment about when to rely on the model and when to enforce deterministic checks, validation, or fallback paths.

• Designed evaluation systems where quality is measured continuously, not debated qualitatively.

• Experience debugging failures from traces, logs, and metrics rather than from user anecdotes alone.

• Comfort making architecture calls that trade off quality, latency, and cost.

• Enough product instinct to reason about the user experience of creation, not just the internal machinery.

• The ability to move from vague product intent to a concrete technical plan without waiting for someone else to define the system.

Research experience is fine, but shipping production systems matters more here.

This role is not for you if

• You want narrowly scoped tasks with clear upstream specs.

• You prefer working on models in isolation rather than owning the runtime around them.

• You are uncomfortable being accountable for quality, latency, and cost at the same time.

• You do not want to build evals, instrumentation, and debugging tools as part of the job.

• You want a role where the core architecture is already settled.

Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.

Compensation and logistics

• Base salary: $150K–$250K

• Equity: competitive

• Employment: full-time

• Workplace: remote, US only

• Timezone: Pacific or Central Time Zone

• Cadence: daily 6pm standups and a Sunday evening standup that aligns with China Monday

• Visa support: H1B transfers and some new H1Bs considered case by case

What you should be able to explain

• Have you built a coding agent in production? Walk through the architecture.

• Do you have an eval framework for agentic systems? Show how it catches regressions.

• How do you debug a long-horizon workflow when the failure appears three steps after the root cause?

• How do you decide between model quality, latency, and cost when routing traffic or changing models?

About Aurora

Aurora helps exceptional engineers find the right role at some of the most ambitious startups worldwide.

We work with teams that care about high ownership, technical rigor, and clear scope.

Job Overview

Posted Date Jul 01, 2026

Employment Type Full-time

Experience Level Entry level

Location United State

Annual Salary 150,000 - 250,000 USD

Category Machine Learning

Company Aurora

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Principal AI/ML Architect

Machine Learning

•

14h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

Stratus

United State

Senior Machine Learning Engineer - Real-Time Pricing at Scale

Machine Learning

•

15h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

gtmfund

United State

Machine Learning Systems Engineer

Machine Learning

•

20h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

Aurora

United State

Senior AI Agent Engineer - Coding Agent Runtime

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Principal AI/ML Architect

Stratus

Senior Machine Learning Engineer - Real-Time Pricing at Scale

Premium Job

gtmfund

Machine Learning Systems Engineer

Aurora

Subscribe our newsletter