AI Infrastructure/Platform Engineer

LanceSoft, Inc. • United State

Remote

Apply

AI Summary

Join LanceSoft, Inc. as an AI Infrastructure/Platform Engineer to build and operate large-scale GPU compute infrastructure. Key responsibilities include designing and delivering platform features, partnering with development teams, and applying expertise in storage and networking. Ideal candidate should have experience in Platform, Infrastructure, DevOps Engineering and hands-on experience with Kubernetes and container orchestration at scale.

Key Highlights

Build and extend platform capabilities

Design and operate scalable orchestration systems

Partner with development teams to extend GPU developer platform

Key Responsibilities

Build and extend platform capabilities to enable different classes of workloads

Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments

Develop platform features such as pre-flight health checks, job status monitoring and post-mortem analysis

Technical Skills Required

Kubernetes Container orchestration Platform Engineering DevOps Engineering GPU compute infrastructure Storage and networking

Benefits & Perks

$100.00-$107.00/hr

6-month contract with possible extension

Hybrid work arrangement (onsite in San Jose, CA or 100% remote)

Nice to Have

Hands-on experience in storage or network engineering within Kubernetes environments

Experience with Infrastructure as Code tools like Terraform

Background in HPC, Slurm, or GPU-based compute systems for ML/AI workloads

Job Description

Pay Rate: $100.00/hr to $107.00/hr

Duration: 6 Months - possible extension

Location: Onsite in San Jose, CA - Hybrid (100% remote is fine as well for the strong candidate)

THE ROLE:

We are seeking an AI Infrastructure / Platform Engineer to join our team building and operating large-scale GPU compute infrastructure that powers AI and ML workloads. The ideal candidate should be passionate about software engineering and possess leadership skills to independently deliver on multiple projects. They should be able to communicate effectively and work optimally with their peers within our larger organization.

THE PERSON:

Experience in Platform, Infrastructure, DevOps Engineering.

Deep hands-on experience with Kubernetes and container orchestration at scale.

Interested in remote work opportunities in Devops? Discover Devops Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Proven ability to design and deliver platform features that serve internal customers or developer teams

Experience building developer-facing platforms or internal developer portals (e.g. Custom workflow tooling).

KEY RESPONSIBILITIES:

Build and extend platform capabilities to enable different classes of workloads (e.g., Large-scale AI training, inferencing etc).

Design and operate scalable orchestration systems using Kubernetes across both on-prem and multi-cloud environments.

Develop platform features such as pre-flight health checks, job status monitoring and post-mortem analysis.

Partner with development teams to extend the GPU developer platform with features, APIs, templates, and self-service workflows that streamline job orchestration and environment management.

Apply expertise in storage and networking to design and integrate CSI drivers, persistent volumes, and network policies that enable high-performance GPU workloads.

Production support on large-scale GPU clusters.

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

PREFERRED EXPERIENCE:

Hands-on experience in storage or network engineering within Kubernetes environments (e.g., CSI drivers, dynamic provisioning, CNI plugins, or network policy).

Experience with Infrastructure as Code tools like Terraform.

Background in HPC, Slurm, or GPU-based compute systems for ML/AI workloads.

Practical experience with monitoring and observability tools (Prometheus, Grafana, Loki, etc.).

Understanding of machine learning frameworks (PyTorch, vLLM, SGLang, etc.).

High performance network and IB/RDMA tuning.

ACADEMIC CREDENTIALS:

Bachelor’s or master's degree in computer science, computer engineering, electrical engineering, or equivalent.

Job Overview

Posted Date May 17, 2026

Employment Type Contract

Experience Level Mid-Senior level

Location United State

Annual Salary 100,000 USD

Category Devops

Company LanceSoft, Inc.

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

AI Cloud Infrastructure Engineer

Devops

•

15h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

omni studio

United State

Cloud Application Architect

Devops

•

1d ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

NTT DATA North America

United State

System Engineer - Infrastructure

Devops

•

1d ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

remotehunter

United State

AI Infrastructure/Platform Engineer

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

AI Cloud Infrastructure Engineer

omni studio

Cloud Application Architect

Premium Job

NTT DATA North America

System Engineer - Infrastructure

remotehunter

Subscribe our newsletter