Machine Learning Software Engineer

DeepRec.ai European Union
Remote
This Job is No Longer Active This position is no longer accepting applications

Job Description

Machine Learning Software Engineer


Distributed ML Training


Location: Fully Remote

Type: Full-time


Join an innovative Series A Deep Tech company at the forefront of AI and blockchain technology! Backed by top investors with over $50 million in funding, our client is a team of 20 industry experts, looking to grow to 35. They are leveraging blockchain to provide globally accessible computing resources for AI platforms and are seeking world-class engineers to accelerate AI progress. This is a fully remote role offering a high level of autonomy.


Responsibilities:


  • ML Orchestration System Design: Develop systems for orchestrating ML execution across decentralized and heterogeneous infrastructure.
  • Performance Optimization: Profile and optimize training algorithms continually.
  • Implement Novel Research: Build new mechanisms and algorithms to solve unprecedented problems.
  • Engineering Support: Collaborate on broader ML issues, such as reproducible training.
  • Technical Writing and Engagement: Contribute to technical reports and papers, and engage with the community.


Minimum Requirements:


  • Distributed Foundation Model Training: Experience designing or working with training systems on large clusters.
  • Networking Proficiency: Understanding and troubleshooting experience with IP, TCP, UDP, HTTP, and communication backends like NCCL, GLOO, and MPI.
  • Open Source Contributions: Experience with large open-source codebases as a maintainer or trusted contributor.
  • Rust Enthusiasm: Willingness to learn Rust to work across the codebase.
  • Computer Science Background: Solid understanding of computational complexity and broad knowledge of algorithms and data structures.
  • Self-motivation and Communication: Highly self-motivated with excellent verbal and written communication skills.
  • Applied Research Comfort: Comfortable working in a high-autonomy, unpredictable applied research environment.


Bonus Skills:


  • Rust Expertise: Strong experience with systems programming in Rust, understanding lifetimes, and the purpose of Pin.
  • Research Experience: Published research in distributed systems or ML domains.
  • Blockchain Knowledge: Understanding of blockchain fundamentals.



Be part of a team dedicated to democratizing AI, where you can leverage your expertise in distributed ML training, networking, and open-source contributions to make a significant impact. Embrace autonomy, continuous learning, and the drive to push innovative solutions in a highly collaborative and flexible environment.


Apply now to join this cutting-edge team and contribute to the future of AI!

Subscribe our newsletter

New Things Will Always Update Regularly