AI Platform Engineer - Infrastructure & LLM Deployment

vxneo labs • India

Remote

Apply

AI Summary

Own and develop the core AI infrastructure, including model routing, open-source LLM deployment, and EU server management. Ship features across backend, mobile, and dashboard, reporting directly to the founder. Focus on rapid iteration and impactful code delivery.

Key Highlights

Owns the infrastructure layer: model routing engine, open-source model deployment, EU server infrastructure, Pi fleet management.

Ships features across FastAPI backend, Flutter mobile apps, and coordinator dashboard.

Direct reporting to the founder with decisions made in hours, not sprints.

Key Responsibilities

Own the infrastructure layer: model routing engine, open-source model deployment stack, EU server infrastructure, and Pi fleet management.

Ship features across the FastAPI backend, Flutter mobile apps, and coordinator dashboard.

Implement and maintain the neo_model_router.py 3-tier routing engine.

Set up and optimize Ollama on Pi 5 fleet, including model quantisation and performance tuning.

Integrate Mistral API as a primary Tier 2 provider.

Evaluate and onboard new open-source models.

Build the vLLM self-hosting stack for future GPU data center deployment.

Implement safety classifier logic for routing critical care queries.

Migrate and manage EU workloads on OVHcloud Strasbourg.

Orchestrate Docker Compose across Hetzner and OVHcloud.

Ensure GDPR-compliant data pipeline, with zero personal data leaving EU servers.

Monitor, alert, and ensure uptime for 24/7 care AI.

Manage SSL, TLS, reverse proxy, and secrets management.

Remotely manage Pi 5 devices at client homes.

Develop automated deployment scripts for new client Pi setup.

Monitor and auto-recover systemd services.

Optimize Whisper STT for Austrian German dialect accuracy.

Extend FastAPI backend from 19 to 30+ endpoints.

Implement the /ask/v2 endpoint backed by the model router.

Develop the health analytics pipeline for pain trend analysis and reporting.

Build and maintain multi-tenant architecture for data isolation.

Develop Flutter app features for clients and coordinators.

Enhance the chatvx.com coordinator dashboard.

Improve Telegram bot functionality for coordinator alerts.

Technical Skills Required

Python FastAPI Flutter Docker Compose systemd Nginx Ollama vLLM Mistral Qwen DeepSeek Llama Phi-3 Whisper STT Piper TTS Porcupine OpenWakeWord PoseNet Neo4j Qdrant Redis Hetzner VPS OVHcloud Raspberry Pi Connect GitHub Actions Gmail API Google Calendar Telegram Bot Next.js

Benefits & Perks

Compensation based on experience

Equity discussion after 6-month review

Fully remote from India

Async-first work

Flexible hours

Hardware provided for testing

Nice to Have

Experience with vLLM or GPU inference server setup (CUDA, memory management)

Quantisation experience (GGUF, Q4_K_M, AWQ formats)

Flutter / Dart experience

Neo4j experience (graph queries, Cypher, schema design)

Austrian German or European language context

Experience with GDPR technical implementation (DPA compliance, data minimisation)

Knowledge of European care tech, disability tech, or clinical AI

Job Description

This is a role, where you will own the infrastructure layer — the model routing engine, the open-source model deployment stack, the EU server infrastructure, and the Pi fleet management. You will also

ship features across the FastAPI backend, Flutter mobile apps, and coordinator dashboard. You report directly to the founder. No middle management. No ticket queues. Decisions in

hours, not sprints.If you want to write code that matters and ships the same week — this is for you.

What Neo Does Today (Your Starting Point)

| Feature | Technology |

| Voice wake word | Porcupine — custom "Hey Neo" model |
| Speech-to-text | Whisper STT — local on Pi, zero cloud |
| Text-to-speech | Piper TTS — natural human voice |
| AI model routing | Claude · Mistral · DeepSeek V4 · Qwen 3 · Llama 3.3 |
| On-device inference | Ollama — Mistral 7B / Phi-3 Mini running on Pi |
| Graph memory | Neo4j — 9-dimensional, persistent across reboots |
| Vector search | Qdrant — semantic memory retrieval |
|Real-time state | Redis — alert queue and emotion state |
| Fall detection | Sony AITRIOS IMX500 + PoseNet — on-chip, zero CPU |
| Health check-ins | APScheduler — daily pain and mood tracking |
| Voice email | Gmail API + Whisper — reads and dictates replies |
| API layer | FastAPI (Python 3.13) — 19 endpoints |
| Mobile apps | Flutter — Android (Play Store) + iOS (App Store) |
| Infrastructure | Hetzner VPS · OVHcloud France · Docker Compose · systemd |
| Deployment | 6 auto-restarting systemd services — fully autonomous |

What You Will Own and Build

Model Router & Open-Source AI Stack (highest priority)

- Implement and maintain neo_model_router.py — 3-tier routing engine

(Tier 1: Ollama on-device → Tier 2: Mistral/Qwen/DeepSeek EU cloud → Tier 3: Claude safety net)

- Set up and optimise Ollama on Pi 5 fleet — model quantisation (Q4_K_M), performance tuning

- Integrate Mistral API (EU-hosted, GDPR-native) as primary Tier 2 provider

- Evaluate and onboard new open-source models: Qwen 3, DeepSeek V4, Phi-3, Gemma 2

- Build the vLLM self-hosting stack for future GPU data center deployment

- Safety classifier logic — routing critical care queries (pain 8+, falls, crisis) to appropriate models

EU Infrastructure & Data Residency

- Migrate and manage EU workloads on OVHcloud Strasbourg (France-resident data)

- Docker Compose orchestration across Hetzner (Germany) and OVHcloud (France)

- GDPR-compliant data pipeline — ensure zero personal data leaves EU servers

- Monitoring, alerting, and uptime for 24/7 care AI (clients depend on this at night)

- SSL, TLS, reverse proxy (Nginx/Caddy), secrets management

Pi 5 Fleet Engineering

- Remote management of Pi 5 devices at client homes (Raspberry Pi Connect)

- Automated deployment scripts — new client Pi setup in under 10 minutes

- Systemd service monitoring, auto-recovery, over-the-air config updates

- Whisper STT optimisation for Austrian German dialect accuracy

FastAPI Backend

- Extend from 19 to 30+ endpoints as new features ship

- /ask/v2 — the new model-router-backed endpoint (already designed, needs wiring)

- Health analytics pipeline — pain trend analysis, coordinator alerts, weekly reports

- Multi-tenant architecture — clean data isolation across Prime Program clients

Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.

Mobile & Dashboard

- Flutter app features (Android + iOS) — client-facing and coordinator-facing

- chatvx.com coordinator dashboard (Next.js) — real-time client health overview

- Telegram bot improvements — coordinator alert formatting, rich notifications

Data Center Roadmap (6–18 months)

- Phase 1: Optimise on-device Ollama across Pi fleet

- Phase 2: Stand up shared GPU inference server (RTX 4090 or A10G) — self-host Qwen/Mistral

- Phase 3: Full vLLM cluster — all Tier 2 models self-hosted, zero external API dependency

- This is the engineering track that removes €500+/month in external API costs at scale

Your Tech Stack

Languages

Python 3.13 · Dart (Flutter) · JavaScript/TypeScript (Next.js) · Bash

AI & Models (open-source focus)

Ollama · vLLM · Mistral 7B / Small / Medium · Qwen 3 72B · DeepSeek V4 · Llama 3.3 70B ·

Phi-3 Mini · Whisper STT · Piper TTS · Porcupine · OpenWakeWord · PoseNet

Model APIs

Mistral API · Together AI · Cohere · Anthropic Claude (safety fallback only)

Databases

Neo4j (graph) · Qdrant (vector) · Redis (state)

Infrastructure

Raspberry Pi 5 (8GB) · Hetzner VPS · OVHcloud (France) · Docker Compose ·

systemd · Nginx · Raspberry Pi Connect · GitHub Actions

APIs & Integrations

Gmail API · Google Calendar · Telegram Bot · Firebase · Brevo SMTP · Ghost CMS ·

Sony AITRIOS IMX500

Compliance Environment

GDPR Art.28 · Austrian DSG · IRAP SR&ED (Canada) · EU AI Act awareness

You Are Exactly Right For This If

- You have *3+ years of Python backend* — FastAPI, async, real production systems

- You have deployed *open-source LLMs* in any form — Ollama, vLLM, llama.cpp, HuggingFace

- You are comfortable with *Linux, SSH, Docker, systemd* — you fix things without being told where

- You have worked on *edge or IoT systems* — or are deeply curious about them

- You think about *cost efficiency* — you understand why routing matters at scale

- You are genuinely excited about *privacy-first, sovereign AI* — not just cloud wrappers

- You *ship things* — GitHub activity, side projects, something real you built and deployed

- You can work *across IST / CEST (Vienna) / EST (Ontario)* time zones independently

- You communicate proactively — *no chasing for updates*, no daily standups needed

- You care about the mission — real disabled clients in Vienna depend on this system

Strong Bonus If You Have

- Experience with *vLLM* or GPU inference server setup (CUDA, memory management)

- Quantisation experience — GGUF, Q4_K_M, AWQ formats

- Flutter / Dart — even basic experience with the mobile apps

- Neo4j — graph queries, Cypher, schema design

- Austrian German or European language context — helps with client empathy

Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.

- Experience with GDPR technical implementation — DPA compliance, data minimisation

- Knowledge of European care tech, disability tech, or clinical AI

Salary & Compensation

- Compensation: based on experience

- Equity: Discussion after 6-month successful review — early-stage, real upside

- Work: Fully remote from India · async-first · flexible hours

- Hardware: We ship you what you need for testing (Pi 5 dev kit discussed on onboarding)

- Growth: You are employee #1 in India — as the team grows, you grow with it

What We Are Not

- We are not a body shop. You own your work.

- We are not a corporate AI team. No Jira ceremonies, no sprint reviews.

- We are not a startup with a demo and no users. We have a live client's, a signed contract,with care organisation's

- We do not use AI to replace care workers. We use AI to give disabled people more autonomy.

Our Values

Sovereignty — Client data stays on-device. Privacy is not a feature, it is the architecture.

Open Source First — We build with open models so we never depend on one vendor's pricing.

Our data center roadmap exists precisely to eliminate that dependency entirely.

Impact Over Optics — Every line of code we ship touches a real person's daily life.

Amrit in Vienna hears Neo's voice every morning. That is who we build for.

Craft — We write code that is readable, documented, and built to last. Fast is good.

Fast and clean is what we ship.

How to Apply

Send an email to contact@vxneolabs.com with the subject line:

AI Platform Engineer — [Your Name] — India

Include:

1. *Something you built and deployed* — one paragraph, no templates. What was it,

what stack, what did you learn, what broke and how did you fix it.

2. *Your GitHub profile* — or any code you can share publicly

3. *Your experience with open-source LLMs* — even a paragraph. Ollama, vLLM,

Hugging Face, llama.cpp — anything real.

4. *Your resume or LinkedIn*

5. *Expected monthly compensation in INR*

6. *Your availability to start*

Applications without a GitHub link or a specific "something I built" paragraph

will not be reviewed. We are not looking for the best resume. We are looking for

the best builder.

Direct applications only. No recruiters. No agencies.

*VXNeo Labs Inc. is an equal opportunity employer. We especially welcome applications

from engineers with lived experience of disability or caregiving — you will understand

our mission better than anyone.*

Job Overview

Posted Date May 24, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location India

Category Programming

Company vxneo labs

Mentioned Skills

Industries

Similar Jobs

Explore other opportunities that match your interests

Web Developer (Frontend and Backend)

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Associate

fetchjobs.co

India

Director of Technology

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Mid-Senior level

Shuru

India

Senior Backend Engineer

Programming

•

6h ago

Premium Job

•••••• •••••• ••••••

Job Type ••••••

Experience Level ••••••

publicis production

India

AI Platform Engineer - Infrastructure & LLM Deployment

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

Web Developer (Frontend and Backend)

fetchjobs.co

Director of Technology

Shuru

Senior Backend Engineer

Premium Job

publicis production

Subscribe our newsletter