Senior Software Engineer - AI Terminal Agent Evaluator

yo hr consultancy • United State

Remote

This Job is No Longer Active This position is no longer accepting applications

AI Summary

Collaborate with a top academic research lab to evaluate and improve terminal-based AI agents. Analyze, solve, and document benchmark tasks involving Docker, shell scripting, and Linux system administration. Contribute high-quality reference solutions and diagnostic insights to improve agent performance metrics.

Key Highlights

Collaborate with a top academic research lab

Evaluate and improve terminal-based AI agents

Contribute high-quality reference solutions and diagnostic insights

Key Responsibilities

Systematically analyze, solve, and document benchmark tasks involving Docker, shell scripting, and Linux system administration

Evaluate agent outputs for correctness, reproducibility, and reliability across complex multi-step CLI workflows

Synthesize information across files and configurations to assess end-to-end architecture

Technical Skills Required

Docker Shell scripting Linux system administration Python Distributed systems

Benefits & Perks

Fully remote

Short-term, high-intensity contract

Independent contractor

Job Description

Role Overview

Collaborating with a top academic research lab focused on advancing AI agents in real-world system environments. We're seeking high-performing software engineers based in Five Eyes countries to rigorously evaluate and improve terminal-based agents through the Terminal-Bench 2.0 benchmark suite. This is a short-term, high-intensity contract ideal for engineers with deep systems-level expertise and a passion for hands-on problem-solving. Due to the complexity of the tasks, high engagement and consistent weekly availability are critical.

Key Responsibilities

Systematically analyze, solve, and document benchmark tasks involving Docker, shell scripting, and Linux system administration
Evaluate agent outputs for correctness, reproducibility, and reliability across complex multi-step CLI workflows
Provide detailed, evidence-based reasoning grounded in code structure and terminal behavior
Synthesize information across files and configurations to assess end-to-end architecture
Contribute high-quality reference solutions and diagnostic insights to improve agent performance metrics

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Ideal Qualifications

2+ years of hands-on experience at top-tier tech companies, quant firms, or elite startups
Bachelor’s or Master’s in Computer Science or related field from a top 50–100 global university
Deep familiarity with terminal workflows, Linux environments, and shell scripting
Strong knowledge of Docker, Git, Python, and distributed systems concepts
Demonstrated ability to trace, debug, and explain complex system behaviors across multiple files
Commitment to intellectual honesty, clarity, and rigorous methodology

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Application Process

Submit your resume and brief experience summary
Qualified applicants will be invited to complete a short-form technical assessment
We typically follow up within 3–5 business days with next steps

Contract and Payment Terms

You will be engaged as an independent contractor.
This is a fully remote role that can be completed on your own schedule.
Projects can be extended, shortened, or concluded early depending on needs and performance.

Skills: docker,scripting,linux,bash,shell scripting

Job Overview

Posted Date Feb 12, 2026

Employment Type Contract

Experience Level Mid-Senior level

Location United State

Category Devops

Company yo hr consultancy

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

Cloud & Security Engineer

Devops

•

51m ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

ingenuity group

United State

Senior Python Software Systems Engineer - Remote

Devops

•

9h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

sundayy

United State

IT Cloud Engineer

Devops

•

9h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

Mayo Clinic

United State

Senior Software Engineer - AI Terminal Agent Evaluator

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Cloud & Security Engineer

ingenuity group

Senior Python Software Systems Engineer - Remote

sundayy

IT Cloud Engineer

Premium Job

Mayo Clinic

Subscribe our newsletter