Frontier AI Model Evaluation UI Developer

jetbridge ai United State
Relocation
Apply
AI Summary

Build UI for exploring LLM evaluation results and experiment outputs. Design data visualizations and implement end-to-end traceability of LLM runs. Partner with researchers to iterate quickly while balancing clarity, accuracy, and performance.

Key Highlights
Build UI for exploring LLM evaluation results
Design data visualizations
Implement end-to-end traceability of LLM runs
Technical Skills Required
React TypeScript D3 Plotly Vega/Vega-Lite Visx Three.js Highcharts ECharts
Benefits & Perks
Relocation sponsored
On-site in San Francisco

Job Description


Our Client is a well-funded nonprofit research organization focused on measuring frontier AI capabilities—especially agentic / autonomous capabilities and the ability of models to conduct AI R&D, because those capabilities can create outsized societal and security risk if they scale faster than our ability to evaluate and govern them.


Their work is unusually “real-world” compared to typical benchmarks: they build evaluations with high realism and measure performance against skilled-human baselines (often multi-hour tasks), and publish research on how quickly models are improving at completing long tasks.


You’d be building the UI that turns messy LLM evaluation outputs into clear, explorable artifacts that researchers can trust.


What you’ll do

- Build React + TypeScript interfaces for exploring LLM evaluation results and experiment outputs.

- Design and implement data visualizations that make model behavior, metrics, and results easy to inspect.

- Build workflows that support end-to-end traceability of LLM runs (prompts → intermediate steps → decisions → outputs).

- Partner closely with researchers; iterate quickly while balancing clarity, accuracy, and performance.


Tech stack / must-haves

- React + TypeScript

- Hands-on with at least one major visualization library: D3, Plotly, Vega/Vega-Lite, Visx, Three.js, Highcharts, ECharts


Why this matters

- Their mission is to give society and AI labs grounded answers to: “What can frontier models actually do?” and “When do capabilities become dangerous?”

- The team includes researchers and engineers with backgrounds across top AI orgs and programs (e.g., OpenAI, DeepMind, and alumni of OxfordCaltechMIRI, and ML interpretability programs).


Location

- On-site in San Francisco (relocation sponsored).



Subscribe our newsletter

New Things Will Always Update Regularly