Lead the deployment of large language models including DeepSeek, Kimi, Qwen and LLaMA into sovereign, air-gapped environments. Architect and optimise NVIDIA GPU clusters with quantisation techniques for high-performance inference. Requires proven experience with vLLM, TGI, Ollama and Kubernetes in highly regulated settings.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
AI / LLM Deployment Engineer
Location: Remote (GST time zone preferred) with occasional travel to Abu Dhabi if required
Travel: Occasional international travel
Compensation: Exceptional package reflecting seniority, technical expertise and impact
What's in it for you?
This isn't another AI application role. You'll lead the deployment of large language models including DeepSeek, Kimi and Qwen into sovereign, air-gapped environments where GPU performance, inference optimisation and security are business critical. If you're passionate about high-performance AI infrastructure, this is an opportunity to solve problems that very few engineers get to tackle.
Package / Benefits
- Exceptional package reflecting seniority and specialist expertise
- Fully remote initially with flexibility for future relocation if desired
- Visa sponsorship available where applicable
- Work with cutting-edge open-weight LLMs and enterprise GPU infrastructure
- Influence deployment architecture from the ground up
Searching for Development & Programming roles that provide visa sponsorship? Connect with international employers through Development & Programming Jobs with Visa Sponsorship opportunities actively seeking talented professionals.
Why this business
Join a globally focused technology business developing sovereign AI and intelligence platforms for highly regulated environments across multiple international markets. Working at the forefront of secure AI deployment, the organisation is investing heavily in advanced infrastructure and offers the opportunity to solve technically demanding challenges alongside a highly experienced engineering team.
Explore our comprehensive directory of visa sponsorship jobs from employers worldwide who are ready to sponsor talented international professionals.
What you'll be doing
- Architect and deploy LLMs including DeepSeek, Kimi, Qwen and LLaMA into secure, air-gapped production environments
- Configure and optimise NVIDIA H100/H200 GPU clusters, NVLink and InfiniBand infrastructure for high-performance inference
- Apply GPTQ, AWQ and GGUF quantisation techniques to maximise deployment efficiency without compromising model performance
- Deploy and optimise inference runtimes including vLLM, TGI and Ollama within Kubernetes environments, delivering target throughput and latency SLAs
What you'll bring
- Proven commercial experience deploying production LLMs using vLLM, TGI, Ollama or equivalent inference platforms
- Expert knowledge of Kubernetes, NVIDIA GPU infrastructure, GPU memory optimisation and high-performance computing
- Hands-on experience with model quantisation techniques including GPTQ, AWQ or GGUF
- Experience delivering on-premise or air-gapped AI deployments. Experience within government, defence, cyber security or other highly regulated environments would be advantageous.
Interested in opportunities specifically in United Arab Emirates? Discover our dedicated Visa Sponsorship Jobs in United Arab Emirates page featuring roles from top employers in this location.
Who this suits
You're an infrastructure engineer who thrives on solving complex deployment challenges rather than building AI applications. You understand what it takes to run large language models reliably at scale, enjoy optimising GPU performance and want to work on technically demanding projects where security, performance and engineering excellence are non-negotiable.
Apply now for a confidential conversation with Walker Lovell.
Similar Jobs
Explore other opportunities that match your interests
Walker Lovell