We are looking for a highly skilled Senior AI Infrastructure Engineer to build automation and lifecycle systems for a global GPU-cluster fleet. As a key member of the team, you will design and develop backend services and automation tooling in Python. You will work with cutting-edge NVIDIA hardware and have a strong ownership mindset and clear communication skills.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
AI Infrastructure Engineer โ GPU Clusters - Up to 250K Base - Remote
This position is open to candidates working remotely in the United States or Canada.
Our client is a cloud technology company driving the next generation of AI infrastructure. They empower organizations to build and scale AI and ML solutions without the need for large in-house teams or heavy upfront infrastructure costs. Their global team of engineers works at the forefront of GPU cloud computing, supporting businesses across industries to solve complex, real-world problems.
The company operates with a flat structure, minimal bureaucracy, and a strong focus on ownership, speed, and technical excellence. Engineers work closely with customers and internal teams to design scalable solutions and influence product direction, creating direct impact on how modern AI platforms are built and operated.
The Role
They are looking for someone to build the automation and lifecycle systems that power a global, large-scale GPU-cluster fleet. This is a hands-on engineering role at the intersection of software and physical infrastructure.
You will work with cutting-edge NVIDIA hardware that most engineers never get close to, and you'll be helping design systems that often get redesigned within weeks: because that's the pace. If you thrive in environments where speed, autonomy, and real engineering ownership matter, this role is for you.
Responsibilities
- Design and develop backend services and automation tooling in Python
- Build and maintain provisioning, testing, and lifecycle management systems for physical hardware, including software that runs directly on bare-metal environments
- Integrate with Linux systems using shell scripting and low-level tooling, and implement CI/CD pipelines for infrastructure-focused software
- Work across networking layers (IPv4/IPv6, DHCP, DNS, network boot) and interface with hardware management controllers and their protocols
- Design NoSQL data stores for system state and orchestration
- Support ARM64 architectures and contribute clear documentation and operational excellence across large machine fleets
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
What You'll Bring
- 10+ years of professional experience.
- Strong Python engineering experience with a solid Linux and shell scripting background
- Hands-on familiarity with bare-metal servers, networking fundamentals, and hardware management interfaces and APIs
- Experience with CI/CD pipelines and NoSQL databases
- The ability to debug complex issues spanning software, hardware, and networks
- A strong ownership mindset and clear communication skills in a distributed team
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Nice to Have
- Experience at large infrastructure scale, with ARM platforms in production, or in hardware testing and factory provisioning
- A background in infrastructure automation, internal platform tooling, or open-source systems software
Interview Process
- Preliminary interview
- Technical coding interview
- Final technical deep dive
The Offer
- Base salary up to 250K USD plus bonus and RSUs
- Remote role within the US/Canada
- No take-home assignment throughout the process
Similar Jobs
Explore other opportunities that match your interests
3 bridge networks
andrew an amphenol company