Own the core agent runtime that turns user intent into interactive mini-apps. Design and implement end-to-end orchestration, evaluation, and reliability systems for production AI workflows. Requires strong technical ownership, debugging skills, and experience shipping coding agents at scale.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
Senior AI Agent Engineer — Coding Agent Runtime
Remote (US) · Pacific or Central Time Zone · Full-time
$150K–$250K base + competitive equity
The company
The company is building a consumer social platform around interactive mini-apps: users browse a feed of playable experiences and can create their own by describing what they want.
The creation flow is AI-native. It turns natural language into shareable, interactive content that can be used immediately and published.
The product already has real scale, with 1M+ monthly active users.
The company is backed by a16z, Mayfield, and Khosla, and has raised $30M.
The team is small, around 30 people, and the founder/operator background comes from consumer social at ByteDance.
The role
This is the engineer who owns the core engine behind creation: the agent runtime that turns intent into working mini-apps.
You will set the technical bar for the system end to end: architecture, orchestration, evaluation, reliability, and model strategy.
This is a greenfield-scope role. The team built its own agent framework from scratch, so you are shaping the platform rather than inheriting a large legacy system.
Searching for Machine Learning & AI roles that provide visa sponsorship? Connect with international employers through Machine Learning & AI Jobs with Visa Sponsorship opportunities actively seeking talented professionals.
The output is user-facing product quality, not a demo. The bar is whether the system can repeatedly produce correct, shareable experiences under real consumer usage.
The technical problem
The hard part is not getting a model to generate code once.
The hard part is building a system that can reliably handle ambiguous user intent, long-horizon reasoning, execution failures, validation loops, and publication, while staying debuggable, measurable, and cost-aware.
It also has to know when to trust the model and when to fall back to deterministic validation or rerouting.
The core workflow looks like this: prompt → plan → generate → run/validate → repair → publish.
That workflow has to work across messy edge cases: incomplete specs, failing builds, bad intermediate outputs, model regressions, and user-level quality problems that are hard to diagnose without strong tracing and evaluation infrastructure.
What you'll own
• Agent runtime and orchestration: own the execution layer that coordinates planning, tool use, generation, validation, repair, and publishing.
• Long-horizon workflows: design the control flow for multi-step tasks where the model has to recover from failure and still produce a usable result.
• Evaluation and quality loops: build eval harnesses, regression suites, failure taxonomies, and release gates that make quality measurable.
• Model strategy: choose when to route, retry, benchmark, or swap models based on reliability, task type, latency, and cost.
• Observability and debugging: make agent behavior traceable with logs, metrics, alerts, and replayable debugging paths.
• Reliability improvements: reduce failure rates, tighten feedback loops, and build systems that degrade gracefully when models or dependencies fail.
• Product quality at scale: improve the creation experience for a user base already in the millions, where small quality regressions become visible quickly.
Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.
Who this is for
You are likely a strong fit if you have:
• Owned production AI or ML systems end to end, not just a component of one.
• Built or operated coding agents, agentic workflows, or similar long-horizon systems in production.
• Strong judgment about when to rely on the model and when to enforce deterministic checks, validation, or fallback paths.
• Designed evaluation systems where quality is measured continuously, not debated qualitatively.
• Experience debugging failures from traces, logs, and metrics rather than from user anecdotes alone.
• Comfort making architecture calls that trade off quality, latency, and cost.
• Enough product instinct to reason about the user experience of creation, not just the internal machinery.
• The ability to move from vague product intent to a concrete technical plan without waiting for someone else to define the system.
Research experience is fine, but shipping production systems matters more here.
This role is not for you if
• You want narrowly scoped tasks with clear upstream specs.
• You prefer working on models in isolation rather than owning the runtime around them.
• You are uncomfortable being accountable for quality, latency, and cost at the same time.
• You do not want to build evals, instrumentation, and debugging tools as part of the job.
• You want a role where the core architecture is already settled.
Interested in opportunities specifically in United State? Discover our dedicated Visa Sponsorship Jobs in United State page featuring roles from top employers in this location.
Compensation and logistics
• Base salary: $150K–$250K
• Equity: competitive
• Employment: full-time
• Workplace: remote, US only
• Timezone: Pacific or Central Time Zone
• Cadence: daily 6pm standups and a Sunday evening standup that aligns with China Monday
• Visa support: H1B transfers and some new H1Bs considered case by case
What you should be able to explain
• Have you built a coding agent in production? Walk through the architecture.
• Do you have an eval framework for agentic systems? Show how it catches regressions.
• How do you debug a long-horizon workflow when the failure appears three steps after the root cause?
• How do you decide between model quality, latency, and cost when routing traffic or changing models?
About Aurora
Aurora helps exceptional engineers find the right role at some of the most ambitious startups worldwide.
We work with teams that care about high ownership, technical rigor, and clear scope.
Similar Jobs
Explore other opportunities that match your interests
Stratus
Senior Machine Learning Engineer - Real-Time Pricing at Scale
gtmfund