Senior Software Engineer (Model Parallelism)

DeepRec.ai • European Union
Remote
This Job is No Longer Active This position is no longer accepting applications

Job Description

Backend Software Engineer / Distributed Systems


Distributed ML Training, Model Parallelism


Location: Fully Remote

Type: Full-time


Join an innovative Series A Deep Tech company, backed by top investors with over $50 million in funding, our client is a team of 20 industry experts, looking to grow to 35. They are leveraging blockchain to provide globally accessible computing resources for AI platforms and are seeking world-class engineers to accelerate AI progress. This is a fully remote role offering a high level of autonomy.


Responsibilities:


  • ML Orchestration System Design: Develop systems for orchestrating ML execution across decentralized and heterogeneous infrastructure.
  • Performance Optimization: Profile and optimize training algorithms continually.
  • Implement Novel Research: Build new mechanisms and algorithms to solve unprecedented problems.
  • Engineering Support: Collaborate on broader ML issues, such as reproducible training.
  • Technical Writing and Engagement: Contribute to technical reports and papers, and engage with the community.


Minimum Requirements:


  • Distributed Foundation Model Training: Experience designing or working with training systems on large clusters.
  • Experience with various parallelization models (including pipeline, data, and tensor parallelism), different training methods (such as local optimization and global optimization), and a range of optimization techniques (like quantization and gradient compression).
  • Networking Proficiency: Understanding and troubleshooting experience with IP, TCP, UDP, HTTP, and communication backends like NCCL, GLOO, and MPI.
  • Open Source Contributions: Experience with large open-source codebases as a maintainer or trusted contributor.
  • Rust Enthusiasm: Willingness to learn Rust to work across the codebase.
  • Computer Science Background: Solid understanding of computational complexity and broad knowledge of algorithms and data structures.
  • Self-motivation and Communication: Highly self-motivated with excellent verbal and written communication skills.
  • Applied Research Comfort: Comfortable working in a high-autonomy, unpredictable applied research environment.


Bonus Skills:


  • Rust Expertise: Strong experience with systems programming in Rust, understanding lifetimes, and the purpose of Pin.
  • Research Experience: Published research in distributed systems or ML domains.
  • Blockchain Knowledge: Understanding of blockchain fundamentals.



Be part of a team dedicated to democratizing AI, where you can leverage your expertise in distributed ML training, networking, and open-source contributions to make a significant impact. Embrace autonomy, continuous learning, and the drive to push innovative solutions in a highly collaborative and flexible environment.


Apply now to join this cutting-edge team and contribute to the future of AI!

Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

TechShack

European Union

Senior Golang Developer

Programming
•
1w ago
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Mid-Senior level

techtalent resourcing

European Union

Head of Global Talent

Programming
•
2w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Enertek Group

European Union

Subscribe our newsletter

New Things Will Always Update Regularly