Principal Machine Learning Infrastructure Engineer

Acceler8 Talent • United States
Relocation
This Job is No Longer Active This position is no longer accepting applications

Job Description

Introduction:

Join a pioneering team at the forefront of AI and ML technology, where human-computer collaboration is not just a concept but a reality. Our team is dedicated to revolutionizing user experiences by innovating at every level, from user interfaces down to the most efficient models. This is more than a job; it's a journey into the future of technology.


About the Company:

Our company thrives on the belief that a small, focused group of talented individuals can generate significant breakthroughs in the AI field. We're a multi-disciplinary team driven to solve complex, real-world AI challenges. Backed by industry giants and venture capital powerhouses, we're well-positioned to reshape the landscape of AI and ML technologies.


About the Role:

As a Staff ML Infrastucture Engineer, you'll work closely with researchers and product engineers, creating magical product experiences powered by large language models. You'll be at the helm of designing and implementing scalable ML systems, working across high-performance computing clusters. Your expertise will transform the infrastructure for training and serving, pushing the boundaries of AI technology.


What We Offer You:

  • A role where your contributions have a direct impact on groundbreaking AI advancements.
  • A collaborative, innovative environment that fosters growth and learning.
  • Competitive compensation and benefits, including relocation assistance for those moving to San Francisco.
  • Access to cutting-edge technology and resources.


Key Responsibilities:

  • Collaborate in the development of large language models, using state-of-the-art frameworks.
  • Innovate in performance tuning for training and inference workloads in AI models.
  • Develop and optimize training and serving infrastructure, including custom kernel writing.
  • Implement parallelism methods for efficient, large-scale training of AI models.


Relevant Keywords: Large Language Models, Machine Learning, High-Performance Computing, Distributed Systems, Scalability, AI Accelerators, Quantization, Kernel Languages, Cloud Services, Containerization, Network Fundamentals.

Subscribe our newsletter

New Things Will Always Update Regularly