We are hiring a Junior ML Engineer to own LLM orchestration, latency, and scaling for workflows already live with customers. The role involves building and operating end-to-end LLM pipelines for full-document analysis and optimizing latency, throughput, and cost for long-context inference. The ideal candidate has strong Python and async programming fundamentals and experience with schema-constrained LLM outputs.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Junior ML Engineer โ LLM Infrastructure & Orchestration
About Us
We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).
We operate schema-constrained LLM systems: prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows.
Weโre hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month
This role is production ML systems engineering, not model training.
- Build and operate end-to-end LLM pipelines for full-document analysis (100โ500+ page contracts)
- Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs
- Own LLM orchestration logic: prompt routing, validation, retries, fallbacks, and partial re-execution
- Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution)
- Build and scale OCR โ document parsing โ LLM inference pipelines for scanned leases (Textract)
- Develop streaming and async APIs using FastAPI
- Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure)
- Productionize report generation (DOCX/EXCEL) as deterministic pipeline outputs
- Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda)
- Debug production issues: timeouts, schema failures, partial extractions, cost spikes
Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Pydantic-based schemas for all LLM outputs
- Prompt โ schema contracts and versioning
- Validation, retry, and fallback mechanisms
- Latency and cost optimization for long-context inference
- Reliability of OCR + LLM pipelines at scale
- Strong Python and async programming fundamentals
- ~1 year experience working on production ML or LLM systems
- Hands-on experience with Claude, Gemini, and AWS Bedrock
- Experience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar)
- Experience with OCR and document-heavy pipelines
- Experience with Celery or distributed async job systems
- Comfort treating LLMs as non-deterministic services requiring validation and retries
- Individual contributor mindset in a lean startup
- Available to join immediately or within 1 month
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
- Experience with streaming LLM responses
- Familiarity with long-context failure modes and truncation issues
- Experience with LLM output evaluation or regression testing
- Cost monitoring and optimization for LLM inference
- Work on real production ML systems, not demos
- Own core LLM infrastructure end-to-end
- Direct exposure to long-context, document-scale AI
- Fully remote, fast-paced startup
- CTC: โน9,00,000 โ โน12,00,000 (based on experience & impact)
Similar Jobs
Explore other opportunities that match your interests
SOTEC CONSULTING
zeller