Junior ML Engineer for LLM Infrastructure and Orchestration

bryckel ai โ€ข India
Remote
Apply
AI Summary

We are hiring a Junior ML Engineer to own LLM orchestration, latency, and scaling for workflows already live with customers. The role involves building and operating end-to-end LLM pipelines for full-document analysis and optimizing latency, throughput, and cost for long-context inference. The ideal candidate has strong Python and async programming fundamentals and experience with schema-constrained LLM outputs.

Key Highlights
Build and operate end-to-end LLM pipelines for full-document analysis
Optimize latency, throughput, and cost for long-context inference
Own LLM orchestration logic, prompt routing, validation, retries, fallbacks, and partial re-execution
Key Responsibilities
Build and operate end-to-end LLM pipelines for full-document analysis
Optimize latency, throughput, and cost for long-context inference
Own LLM orchestration logic, prompt routing, validation, retries, fallbacks, and partial re-execution
Develop streaming and async APIs using FastAPI
Manage distributed background workloads with Celery
Deploy, monitor, and scale inference workloads on AWS
Technical Skills Required
Python Async programming Pydantic FastAPI Celery AWS Bedrock AWS EC2 AWS S3 AWS Lambda Claude Gemini
Benefits & Perks
Fully remote work
Salary range โ‚น9,00,000 - โ‚น12,00,000 per year
Fast-paced startup environment
Nice to Have
Experience with streaming LLM responses
Familiarity with long-context failure modes and truncation issues
Experience with LLM output evaluation or regression testing

Job Description


Junior ML Engineer โ€“ LLM Infrastructure & Orchestration
About Us

We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).


We operate schema-constrained LLM systems: prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows.


Weโ€™re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month


This role is production ML systems engineering, not model training.


What Youโ€™ll Do
  • Build and operate end-to-end LLM pipelines for full-document analysis (100โ€“500+ page contracts)
  • Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs
  • Own LLM orchestration logic: prompt routing, validation, retries, fallbacks, and partial re-execution
  • Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution)
  • Build and scale OCR โ†’ document parsing โ†’ LLM inference pipelines for scanned leases (Textract)
  • Develop streaming and async APIs using FastAPI
  • Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure)
  • Productionize report generation (DOCX/EXCEL) as deterministic pipeline outputs
  • Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda)
  • Debug production issues: timeouts, schema failures, partial extractions, cost spikes


What Youโ€™ll Own Technically
  • Pydantic-based schemas for all LLM outputs
  • Prompt โ†” schema contracts and versioning
  • Validation, retry, and fallback mechanisms
  • Latency and cost optimization for long-context inference
  • Reliability of OCR + LLM pipelines at scale


Must Have
  • Strong Python and async programming fundamentals
  • ~1 year experience working on production ML or LLM systems
  • Hands-on experience with Claude, Gemini, and AWS Bedrock
  • Experience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar)
  • Experience with OCR and document-heavy pipelines
  • Experience with Celery or distributed async job systems
  • Comfort treating LLMs as non-deterministic services requiring validation and retries
  • Individual contributor mindset in a lean startup
  • Available to join immediately or within 1 month


Nice to Have (Strong ML Signals)
  • Experience with streaming LLM responses
  • Familiarity with long-context failure modes and truncation issues
  • Experience with LLM output evaluation or regression testing
  • Cost monitoring and optimization for LLM inference

  • Why Join Us
    • Work on real production ML systems, not demos
    • Own core LLM infrastructure end-to-end
    • Direct exposure to long-context, document-scale AI
    • Fully remote, fast-paced startup
    • CTC: โ‚น9,00,000 โ€“ โ‚น12,00,000 (based on experience & impact)

    Similar Jobs

    Explore other opportunities that match your interests

    ALM Engineer

    Devops
    โ€ข
    3h ago
    Visa Sponsorship Relocation Remote
    Job Type Full-time
    Experience Level Mid-Senior level

    SOTEC CONSULTING

    India

    SDET QA Engineer

    Devops
    โ€ข
    4d ago
    Visa Sponsorship Relocation Remote
    Job Type Full-time
    Experience Level Mid-Senior level

    zeller

    India
    Visa Sponsorship Relocation Remote
    Job Type Full-time
    Experience Level Entry level

    katonic ai

    India

    Subscribe our newsletter

    New Things Will Always Update Regularly