Gen AI Inferencing Software Engineer

htd resources β€’ United State
Remote Visa Sponsorship
Apply
AI Summary

We are seeking a highly skilled Gen AI Inferencing Software Engineer to design, build, and operate reusable toolkits supporting Gen AI RAG capabilities. This role focuses on developing scalable inferencing frameworks, AI platforms, and automation systems that power enterprise-grade AI/ML solutions. The ideal candidate will have strong experience in Python-based large-scale systems and hands-on expertise in Gen AI lifecycle, RAG pipelines, and inference frameworks.

Key Highlights
Design and maintain reusable Gen AI inferencing toolkits and RAG frameworks
Build scalable AI/ML solutions meeting functional and non-functional standards
Deploy and optimize models using vLLM or Triton Inference Server
Key Responsibilities
Design, develop, and maintain reusable Gen AI inferencing toolkits and RAG frameworks
Build scalable AI/ML solutions meeting functional, non-functional, and compliance standards
Deploy and optimize models using vLLM or Triton Inference Server in containerized environments
Technical Skills Required
Python Gen AI RAG AI-ML lifecycle management vLLM Triton Inference Server containerization CI/CD automation Git Jenkins SonarQube pytest Artifactory Ansible API-based applications FastAPI JWT API Gateway
Benefits & Perks
Relocation: not specified
Employment status: Full-time

Job Description


πŸ“ Location: Addison, TX / Charlotte, NC (Onsite)

πŸ’Ό Employment: W2 Only

πŸ›‚ Visa: USC / H4EAD (please do not apply except these)

πŸ“Œ Relocation: Same state only


We are looking for a highly skilled Software Engineer – Gen AI Inferencing to design, build, and operate reusable toolkits supporting GenAI RAG capabilities. This role focuses on developing scalable inferencing frameworks, AI platforms, and automation systems that power enterprise-grade AI/ML solutions.

If you have strong experience in Python-based large-scale systems and hands-on expertise in GenAI lifecycle, RAG pipelines, and inference frameworks β€” we want to hear from you!


Key Responsibilities:

  • Design, develop, and maintain reusable GenAI inferencing toolkits and RAG frameworks
  • Build scalable AI/ML solutions meeting functional, non-functional, and compliance standards
  • Deploy and optimize models using vLLM or Triton Inference Server in containerized environments
  • Automate CI/CD pipelines and release workflows
  • Develop automated testing frameworks (integration, regression, performance)
  • Perform proof-of-concepts (POCs) and risk mitigation spikes
  • Collaborate with product teams, data scientists, and stakeholders
  • Mentor engineers and promote DevOps and automation best practices


Required Qualifications:

  • 5+ years of OOP development experience (Python / Scala / Java)
  • Strong hands-on experience with GenAI / AI-ML lifecycle management
  • Experience building RAG pipelines (chunking, embeddings, retrieval, reranking, summarization)
  • Model deployment experience with vLLM / Triton Inference Server
  • Experience with containerization and CI/CD automation
  • Experience building API-based applications (FastAPI, JWT, API Gateway)
  • Hands-on DevOps experience (Git, Jenkins, SonarQube, pytest, Artifactory, Ansible)
  • Experience working in large collaborative multi-repo environments


Desired Qualifications:

  • Experience building GenAI inferencing platforms using open-source toolsets
  • AI Gateway, observability, and policy store implementation
  • Strong research mindset with ability to prototype innovative solutions
  • Experience driving quality, automation, and experimentation culture


Key Skills:

Application Development | GenAI | RAG | Python | MLOps | DevOps | Architecture | Automation | CI/CD | Containerization | MongoDB | Redis | React/Angular | API Development | Test Engineering | Collaboration


Similar Jobs

Explore other opportunities that match your interests

Senior Front End iOS Developer

Programming
β€’
9m ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Not Applicable

elevate recruitment

United State

DB2 Linux Admin

Programming
β€’
26m ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

Lorven Technologies Inc.

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

itd

United State

Subscribe our newsletter

New Things Will Always Update Regularly