AI Performance Optimization Engineer

Jobgether • United State
Remote
Apply
AI Summary

Jobgether is seeking an AI Performance Optimization Engineer to improve the performance, efficiency, and scalability of AI training and inference systems. The ideal candidate has deep expertise in AI systems, performance engineering, and large-scale distributed computing environments. This role requires strong programming skills in Python and C++, hands-on experience optimizing deep learning workloads on modern GPU architectures, and a deep understanding of distributed training, inference systems, and model parallelism techniques.

Key Highlights
Improve AI system performance and efficiency
Optimize distributed training and inference
Develop advanced model optimization techniques
Key Responsibilities
Improve the performance, efficiency, and scalability of AI training and inference systems
Profile and optimize end-to-end AI pipelines
Identify bottlenecks across compute, memory, networking, and data pipelines
Technical Skills Required
Python C++ GPU architectures Distributed training Inference systems Model parallelism techniques Quantization Sparsity Pruning Compression Triton XLA TorchInductor TVM
Benefits & Perks
Competitive full-time compensation
Fully remote work model
Long-term, stable engineering engagement
Opportunity to work on cutting-edge AI infrastructure challenges

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Performance Optimization Engineer in the United States.

This role focuses on pushing the limits of performance for large-scale AI systems, with an emphasis on maximizing throughput, minimizing latency, and reducing operational costs across training and inference workloads. You will work across the full stack of AI infrastructure, from GPU-level kernel optimization to distributed system tuning and model serving architecture. The environment is highly technical, data-driven, and collaborative, involving close partnership with ML engineers, platform teams, and product stakeholders. You will help translate complex, ambiguous performance challenges into measurable engineering improvements. The role is ideal for a hands-on expert who thrives in deep systems work and production-grade optimization. You will also contribute to shaping standards, benchmarks, and best practices across AI infrastructure teams.

Accountabilities

You will be responsible for improving the performance, efficiency, and scalability of AI training and inference systems across distributed environments.

  • Profile and optimize end-to-end AI pipelines to improve throughput, latency, and cost efficiency.
  • Identify bottlenecks across compute, memory, networking, and data pipelines, and implement targeted optimizations.
  • Develop and tune advanced model optimization techniques such as quantization, sparsity, pruning, and compression.
  • Optimize distributed training and inference using parallelism strategies (tensor, pipeline, FSDP, ZeRO).
  • Improve LLM serving performance through techniques such as KV caching, batching, and speculative decoding.
  • Drive kernel and compiler-level optimizations using tools like Triton, XLA, TorchInductor, or TVM.
  • Build benchmarking frameworks, performance monitoring systems, and regression testing suites.
  • Collaborate with cross-functional engineering teams to integrate performance best practices into production systems.
  • Evaluate hardware and software technologies and guide adoption decisions based on performance trade-offs.
  • Document optimization strategies and contribute to internal knowledge sharing and technical leadership.

Requirements

The ideal candidate has deep expertise in AI systems, performance engineering, and large-scale distributed computing environments.

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
  • 6+ years of experience in ML systems, performance engineering, or high-performance computing.
  • Strong programming skills in Python and C++, with production-level engineering experience.
  • Hands-on experience optimizing deep learning workloads on modern GPU architectures.
  • Deep understanding of distributed training, inference systems, and model parallelism techniques.
  • Experience with profiling tools across CPU, GPU, and distributed systems.
  • Strong knowledge of memory hierarchies, communication overheads, and system bottlenecks.
  • Familiarity with model compression and optimization techniques and their trade-offs.
  • Strong analytical skills with a disciplined, measurement-driven engineering approach.
  • Excellent communication skills and ability to collaborate across technical and non-technical teams.

Benefits

  • Competitive full-time compensation aligned with experience and expertise
  • Fully remote work model across the United States
  • Long-term, stable engineering engagement on high-impact AI systems
  • Opportunity to work on cutting-edge large-scale AI infrastructure challenges
  • Collaborative, engineering-driven environment with strong technical ownership
  • Exposure to advanced GPU systems, LLM optimization, and distributed AI frameworks
  • Career growth opportunities in high-performance AI systems engineering.

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Miratech

United State

Software Engineer

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

schireson

United State

Senior C++ Engineer - AI Code Review & Reference Implementation

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

sme careers

United State

Subscribe our newsletter

New Things Will Always Update Regularly