Design, develop, and optimize AI infrastructure for on-premise environments. Collaborate with AI/ML teams to deploy and scale models. Strong knowledge of Linux, Windows, and GPU-based systems required.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
AI Infrastructure Engineer (On-Premise)
Employment Type: Full-time
Location: Remote
Experience: 1–5 Years
About the Role
AppXcess Technologies is seeking a skilled AI Infrastructure Engineer with a strong focus on on-premise infrastructure environments. This role is responsible for building, managing, and optimizing infrastructure that supports AI workloads, including GPU-based systems and enterprise-grade server environments.
You will work across both AI systems and core infrastructure layers, ensuring high performance, reliability, and scalability of mission-critical platforms.
Key Responsibilities
AI Infrastructure & Model Operations
- Deploy and operate AI/LLM workloads on GPU-based systems (NVIDIA environments)
- Run and optimize inference servers such as vLLM, Triton, or similar frameworks
- Monitor GPU utilization, memory, and system performance for efficient AI execution
System Design & Backend Engineering
- Design and develop production-grade APIs (FastAPI / gRPC / REST) for AI services
- Architect asynchronous systems using queues, workers, and distributed pipelines
- Build scalable backend systems for high-concurrency AI workloads
Performance, Scalability & Reliability
- Plan infrastructure capacity (GPU usage, latency, throughput optimization)
- Implement batching, rate limiting, and workload optimization strategies
- Ensure system resilience with fault tolerance and graceful degradation
On-Premise Infrastructure Management
- Manage and maintain on-premise infrastructure including servers, storage, and networking systems
- Configure and administer physical and virtual servers across Linux and Windows environments
- Implement and support virtualization platforms such as VMware or Hyper-V
- Manage networking components including switches, routers, firewalls, and load balancers
- Monitor infrastructure performance, availability, and system health
- Perform system upgrades, patching, backups, and disaster recovery processes
- Troubleshoot hardware, network, and system-related issues effectively
- Ensure infrastructure security, access controls, and compliance with best practices
- Maintain clear documentation of configurations, processes, and infrastructure architecture
- Support capacity planning and infrastructure scaling requirements
Interested in remote work opportunities in IT & Network Engineering? Discover IT & Network Engineering Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Observability & Operations
- Build monitoring systems for latency, throughput, GPU usage, and system health
- Diagnose performance bottlenecks and infrastructure issues
- Ensure stable operations under high-load and production conditions
Collaboration & Productionization
- Work closely with AI/ML teams to deploy and scale models into production
- Abstract infrastructure complexity for application and product teams
- Translate experimental AI systems into reliable production deployments
Requirements
Core Requirements
- 1–5 years of experience in infrastructure engineering, system administration, or backend systems
- Strong knowledge of Linux and Windows server environments
- Hands-on experience working with GPU-based systems or compute-intensive workloads
- Strong programming skills in Python (Golang is a plus)
- Understanding of distributed systems and asynchronous processing
- Experience with Docker or containerized environments
- Ability to analyze and optimize system performance (latency, throughput, cost)
Infrastructure & Networking
- Experience with virtualization technologies such as VMware or Hyper-V
- Solid understanding of networking concepts (TCP/IP, DNS, VLANs, VPNs, firewalls)
- Experience with storage systems, backup strategies, and disaster recovery
- Familiarity with enterprise hardware such as rack servers and storage systems
General Skills
- Strong troubleshooting and analytical thinking
- Ability to work independently in a remote setup
- Good communication and collaboration skills
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Strong Practical Experience (Preferred)
- Deployed AI/ML models in production environments
- Debugged GPU-related issues such as memory constraints or latency bottlenecks
- Designed systems for high concurrency and compute-heavy workloads
- Redesigned synchronous systems into asynchronous architectures
Additional Considerations
- Experience working in data center environments
- Basic scripting knowledge (Shell, Python, or PowerShell)
- Familiarity with monitoring and infrastructure management tools
What We're Looking For
- Engineers who understand infrastructure deeply, not just surface-level tools
- Strong problem-solvers who can optimize systems under real-world constraints
- Individuals comfortable working across AI systems and core infrastructure layers
- Hands-on professionals with real production experience
Who May Not Be a Fit
- Candidates with only API-level AI exposure and no infrastructure experience
- Engineers focused only on frontend or prompt engineering
- Profiles without experience in handling compute-heavy or infrastructure systems
What We Offer
- Work on enterprise-grade infrastructure and mission-critical systems
- Exposure to real-world AI and infrastructure environments
- Fully remote work setup with flexible collaboration
- Growth opportunities in AI infrastructure and system engineering
- Health insurance coverage
- Opportunities for international travel based on project needs
- Supportive and high-performance work culture
Similar Jobs
Explore other opportunities that match your interests
ARDEM Incorporated
giftsbazaar.in