Junior ML Engineer for LLM Infrastructure and Orchestration

bryckel ai • India

Remote

Apply

AI Summary

We are hiring a Junior ML Engineer to own LLM orchestration, latency, and scaling for workflows already live with customers. The role involves building and operating end-to-end LLM pipelines for full-document analysis and optimizing latency, throughput, and cost for long-context inference. The ideal candidate has strong Python and async programming fundamentals and experience with schema-constrained LLM outputs.

Key Highlights

Build and operate end-to-end LLM pipelines for full-document analysis

Optimize latency, throughput, and cost for long-context inference

Own LLM orchestration logic, prompt routing, validation, retries, fallbacks, and partial re-execution

Key Responsibilities

Build and operate end-to-end LLM pipelines for full-document analysis

Optimize latency, throughput, and cost for long-context inference

Own LLM orchestration logic, prompt routing, validation, retries, fallbacks, and partial re-execution

Develop streaming and async APIs using FastAPI

Manage distributed background workloads with Celery

Deploy, monitor, and scale inference workloads on AWS

Technical Skills Required

Python Async programming Pydantic FastAPI Celery AWS Bedrock AWS EC2 AWS S3 AWS Lambda Claude Gemini

Benefits & Perks

Fully remote work

Salary range ₹9,00,000 - ₹12,00,000 per year

Fast-paced startup environment

Nice to Have

Experience with streaming LLM responses

Familiarity with long-context failure modes and truncation issues

Experience with LLM output evaluation or regression testing

Job Description

Junior ML Engineer – LLM Infrastructure & Orchestration
About Us

We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).

We operate schema-constrained LLM systems: prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows.

We’re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month

This role is production ML systems engineering, not model training.

What You’ll Do

Build and operate end-to-end LLM pipelines for full-document analysis (100–500+ page contracts)
Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs
Own LLM orchestration logic: prompt routing, validation, retries, fallbacks, and partial re-execution
Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution)
Build and scale OCR → document parsing → LLM inference pipelines for scanned leases (Textract)
Develop streaming and async APIs using FastAPI
Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure)
Productionize report generation (DOCX/EXCEL) as deterministic pipeline outputs

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda)
Debug production issues: timeouts, schema failures, partial extractions, cost spikes

What You’ll Own Technically

Pydantic-based schemas for all LLM outputs
Prompt ↔ schema contracts and versioning
Validation, retry, and fallback mechanisms
Latency and cost optimization for long-context inference
Reliability of OCR + LLM pipelines at scale

Must Have

Strong Python and async programming fundamentals
~1 year experience working on production ML or LLM systems
Hands-on experience with Claude, Gemini, and AWS Bedrock
Experience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar)
Experience with OCR and document-heavy pipelines
Experience with Celery or distributed async job systems

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

Comfort treating LLMs as non-deterministic services requiring validation and retries
Individual contributor mindset in a lean startup
Available to join immediately or within 1 month

Nice to Have (Strong ML Signals)

Experience with streaming LLM responses
Familiarity with long-context failure modes and truncation issues
Experience with LLM output evaluation or regression testing
Cost monitoring and optimization for LLM inference

Why Join Us

Work on real production ML systems, not demos
Own core LLM infrastructure end-to-end
Direct exposure to long-context, document-scale AI
Fully remote, fast-paced startup
CTC: ₹9,00,000 – ₹12,00,000 (based on experience & impact)

Job Overview

Posted Date Feb 13, 2026

Employment Type Full-time

Experience Level Entry level

Location India

Annual Salary 900,000 INR

Category Devops

Company bryckel ai

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

ALM Engineer

Devops

•

3h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

SOTEC CONSULTING

India

SDET QA Engineer

Devops

•

4d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

zeller

India

AI Infrastructure Product Engineer

Devops

•

1w ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Entry level

katonic ai

India

Junior ML Engineer for LLM Infrastructure and Orchestration

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

ALM Engineer

SOTEC CONSULTING

SDET QA Engineer

zeller

AI Infrastructure Product Engineer

katonic ai

Subscribe our newsletter