AI Platform Engineer - Infrastructure & LLM Deployment

vxneo labs โ€ข India
Remote
Apply
AI Summary

Own and develop the core AI infrastructure, including model routing, open-source LLM deployment, and EU server management. Ship features across backend, mobile, and dashboard, reporting directly to the founder. Focus on rapid iteration and impactful code delivery.

Key Highlights
Owns the infrastructure layer: model routing engine, open-source model deployment, EU server infrastructure, Pi fleet management.
Ships features across FastAPI backend, Flutter mobile apps, and coordinator dashboard.
Direct reporting to the founder with decisions made in hours, not sprints.
Key Responsibilities
Own the infrastructure layer: model routing engine, open-source model deployment stack, EU server infrastructure, and Pi fleet management.
Ship features across the FastAPI backend, Flutter mobile apps, and coordinator dashboard.
Implement and maintain the neo_model_router.py 3-tier routing engine.
Set up and optimize Ollama on Pi 5 fleet, including model quantisation and performance tuning.
Integrate Mistral API as a primary Tier 2 provider.
Evaluate and onboard new open-source models.
Build the vLLM self-hosting stack for future GPU data center deployment.
Implement safety classifier logic for routing critical care queries.
Migrate and manage EU workloads on OVHcloud Strasbourg.
Orchestrate Docker Compose across Hetzner and OVHcloud.
Ensure GDPR-compliant data pipeline, with zero personal data leaving EU servers.
Monitor, alert, and ensure uptime for 24/7 care AI.
Manage SSL, TLS, reverse proxy, and secrets management.
Remotely manage Pi 5 devices at client homes.
Develop automated deployment scripts for new client Pi setup.
Monitor and auto-recover systemd services.
Optimize Whisper STT for Austrian German dialect accuracy.
Extend FastAPI backend from 19 to 30+ endpoints.
Implement the /ask/v2 endpoint backed by the model router.
Develop the health analytics pipeline for pain trend analysis and reporting.
Build and maintain multi-tenant architecture for data isolation.
Develop Flutter app features for clients and coordinators.
Enhance the chatvx.com coordinator dashboard.
Improve Telegram bot functionality for coordinator alerts.
Technical Skills Required
Python FastAPI Flutter Docker Compose systemd Nginx Ollama vLLM Mistral Qwen DeepSeek Llama Phi-3 Whisper STT Piper TTS Porcupine OpenWakeWord PoseNet Neo4j Qdrant Redis Hetzner VPS OVHcloud Raspberry Pi Connect GitHub Actions Gmail API Google Calendar Telegram Bot Next.js
Benefits & Perks
Compensation based on experience
Equity discussion after 6-month review
Fully remote from India
Async-first work
Flexible hours
Hardware provided for testing
Nice to Have
Experience with vLLM or GPU inference server setup (CUDA, memory management)
Quantisation experience (GGUF, Q4_K_M, AWQ formats)
Flutter / Dart experience
Neo4j experience (graph queries, Cypher, schema design)
Austrian German or European language context
Experience with GDPR technical implementation (DPA compliance, data minimisation)
Knowledge of European care tech, disability tech, or clinical AI

Job Description


This is a role, where you will own the infrastructure layer โ€” the model routing engine, the open-source model deployment stack, the EU server infrastructure, and the Pi fleet management. You will also

ship features across the FastAPI backend, Flutter mobile apps, and coordinator dashboard. You report directly to the founder. No middle management. No ticket queues. Decisions in

hours, not sprints.If you want to write code that matters and ships the same week โ€” this is for you.


What Neo Does Today (Your Starting Point)


| Feature | Technology |

  • | Voice wake word | Porcupine โ€” custom "Hey Neo" model |
  • | Speech-to-text | Whisper STT โ€” local on Pi, zero cloud |
  • | Text-to-speech | Piper TTS โ€” natural human voice |
  • | AI model routing | Claude ยท Mistral ยท DeepSeek V4 ยท Qwen 3 ยท Llama 3.3 |
  • | On-device inference | Ollama โ€” Mistral 7B / Phi-3 Mini running on Pi |
  • | Graph memory | Neo4j โ€” 9-dimensional, persistent across reboots |
  • | Vector search | Qdrant โ€” semantic memory retrieval |
  • |Real-time state | Redis โ€” alert queue and emotion state |
  • | Fall detection | Sony AITRIOS IMX500 + PoseNet โ€” on-chip, zero CPU |
  • | Health check-ins | APScheduler โ€” daily pain and mood tracking |
  • | Voice email | Gmail API + Whisper โ€” reads and dictates replies |
  • | API layer | FastAPI (Python 3.13) โ€” 19 endpoints |
  • | Mobile apps | Flutter โ€” Android (Play Store) + iOS (App Store) |
  • | Infrastructure | Hetzner VPS ยท OVHcloud France ยท Docker Compose ยท systemd |
  • | Deployment | 6 auto-restarting systemd services โ€” fully autonomous |


What You Will Own and Build

Model Router & Open-Source AI Stack (highest priority)


- Implement and maintain neo_model_router.py โ€” 3-tier routing engine

ย (Tier 1: Ollama on-device โ†’ Tier 2: Mistral/Qwen/DeepSeek EU cloud โ†’ Tier 3: Claude safety net)

- Set up and optimise Ollama on Pi 5 fleet โ€” model quantisation (Q4_K_M), performance tuning

- Integrate Mistral API (EU-hosted, GDPR-native) as primary Tier 2 provider

- Evaluate and onboard new open-source models: Qwen 3, DeepSeek V4, Phi-3, Gemma 2

- Build the vLLM self-hosting stack for future GPU data center deployment

- Safety classifier logic โ€” routing critical care queries (pain 8+, falls, crisis) to appropriate models


EU Infrastructure & Data Residency


- Migrate and manage EU workloads on OVHcloud Strasbourg (France-resident data)

- Docker Compose orchestration across Hetzner (Germany) and OVHcloud (France)

- GDPR-compliant data pipeline โ€” ensure zero personal data leaves EU servers

- Monitoring, alerting, and uptime for 24/7 care AI (clients depend on this at night)

- SSL, TLS, reverse proxy (Nginx/Caddy), secrets management


Pi 5 Fleet Engineering


- Remote management of Pi 5 devices at client homes (Raspberry Pi Connect)

- Automated deployment scripts โ€” new client Pi setup in under 10 minutes

- Systemd service monitoring, auto-recovery, over-the-air config updates

- Whisper STT optimisation for Austrian German dialect accuracy


FastAPI Backend


- Extend from 19 to 30+ endpoints as new features ship

- /ask/v2 โ€” the new model-router-backed endpoint (already designed, needs wiring)

- Health analytics pipeline โ€” pain trend analysis, coordinator alerts, weekly reports

- Multi-tenant architecture โ€” clean data isolation across Prime Program clients


Mobile & Dashboard


- Flutter app features (Android + iOS) โ€” client-facing and coordinator-facing

- chatvx.com coordinator dashboard (Next.js) โ€” real-time client health overview

- Telegram bot improvements โ€” coordinator alert formatting, rich notifications


Data Center Roadmap (6โ€“18 months)


- Phase 1: Optimise on-device Ollama across Pi fleet

- Phase 2: Stand up shared GPU inference server (RTX 4090 or A10G) โ€” self-host Qwen/Mistral

- Phase 3: Full vLLM cluster โ€” all Tier 2 models self-hosted, zero external API dependency

- This is the engineering track that removes โ‚ฌ500+/month in external API costs at scale


Your Tech Stack


Languages

Python 3.13 ยท Dart (Flutter) ยท JavaScript/TypeScript (Next.js) ยท Bash


AI & Models (open-source focus)

Ollama ยท vLLM ยท Mistral 7B / Small / Medium ยท Qwen 3 72B ยท DeepSeek V4 ยท Llama 3.3 70B ยท

Phi-3 Mini ยท Whisper STT ยท Piper TTS ยท Porcupine ยท OpenWakeWord ยท PoseNet


Model APIs

Mistral API ยท Together AI ยท Cohere ยท Anthropic Claude (safety fallback only)


Databases

Neo4j (graph) ยท Qdrant (vector) ยท Redis (state)


Infrastructure

Raspberry Pi 5 (8GB) ยท Hetzner VPS ยท OVHcloud (France) ยท Docker Compose ยท

systemd ยท Nginx ยท Raspberry Pi Connect ยท GitHub Actions


APIs & Integrations

Gmail API ยท Google Calendar ยท Telegram Bot ยท Firebase ยท Brevo SMTP ยท Ghost CMS ยท

Sony AITRIOS IMX500


Compliance Environment

GDPR Art.28 ยท Austrian DSG ยท IRAP SR&ED (Canada) ยท EU AI Act awareness


You Are Exactly Right For This If


- You have *3+ years of Python backend* โ€” FastAPI, async, real production systems

- You have deployed *open-source LLMs* in any form โ€” Ollama, vLLM, llama.cpp, HuggingFace

- You are comfortable with *Linux, SSH, Docker, systemd* โ€” you fix things without being told where

- You have worked on *edge or IoT systems* โ€” or are deeply curious about them

- You think about *cost efficiency* โ€” you understand why routing matters at scale

- You are genuinely excited about *privacy-first, sovereign AI* โ€” not just cloud wrappers

- You *ship things* โ€” GitHub activity, side projects, something real you built and deployed

- You can work *across IST / CEST (Vienna) / EST (Ontario)* time zones independently

- You communicate proactively โ€” *no chasing for updates*, no daily standups needed

- You care about the mission โ€” real disabled clients in Vienna depend on this system


Strong Bonus If You Have


- Experience with *vLLM* or GPU inference server setup (CUDA, memory management)

- Quantisation experience โ€” GGUF, Q4_K_M, AWQ formats

- Flutter / Dart โ€” even basic experience with the mobile apps

- Neo4j โ€” graph queries, Cypher, schema design

- Austrian German or European language context โ€” helps with client empathy

- Experience with GDPR technical implementation โ€” DPA compliance, data minimisation

- Knowledge of European care tech, disability tech, or clinical AI


Salary & Compensation


- Compensation: based on experience

- Equity: Discussion after 6-month successful review โ€” early-stage, real upside

- Work: Fully remote from India ยท async-first ยท flexible hours

- Hardware: We ship you what you need for testing (Pi 5 dev kit discussed on onboarding)

- Growth: You are employee #1 in India โ€” as the team grows, you grow with it



What We Are Not


- We are not a body shop. You own your work.

- We are not a corporate AI team. No Jira ceremonies, no sprint reviews.

- We are not a startup with a demo and no users. We have a live client's, a signed contract,with care organisation's

- We do not use AI to replace care workers. We use AI to give disabled people more autonomy.


Our Values


Sovereignty โ€” Client data stays on-device. Privacy is not a feature, it is the architecture.


Open Source First โ€” We build with open models so we never depend on one vendor's pricing.

Our data center roadmap exists precisely to eliminate that dependency entirely.


Impact Over Optics โ€” Every line of code we ship touches a real person's daily life.

Amrit in Vienna hears Neo's voice every morning. That is who we build for.


Craft โ€” We write code that is readable, documented, and built to last. Fast is good.

Fast and clean is what we ship.


How to Apply


Send an email to contact@vxneolabs.com with the subject line:


AI Platform Engineer โ€” [Your Name] โ€” India


Include:


1. *Something you built and deployed* โ€” one paragraph, no templates. What was it,

ย ย what stack, what did you learn, what broke and how did you fix it.

2. *Your GitHub profile* โ€” or any code you can share publicly

3. *Your experience with open-source LLMs* โ€” even a paragraph. Ollama, vLLM,

ย ย Hugging Face, llama.cpp โ€” anything real.

4. *Your resume or LinkedIn*

5. *Expected monthly compensation in INR*

6. *Your availability to start*


Applications without a GitHub link or a specific "something I built" paragraph

will not be reviewed. We are not looking for the best resume. We are looking for

the best builder.


Direct applications only. No recruiters. No agencies.


*VXNeo Labs Inc. is an equal opportunity employer. We especially welcome applications

from engineers with lived experience of disability or caregiving โ€” you will understand

our mission better than anyone.*


Similar Jobs

Explore other opportunities that match your interests

Web Developer (Frontend and Backend)

Programming
โ€ข
5h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

fetchjobs.co

India

Director of Technology

Programming
โ€ข
5h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Shuru

India

Senior Backend Engineer

Programming
โ€ข
6h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Job Type โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข
Experience Level โ€ขโ€ขโ€ขโ€ขโ€ขโ€ข

publicis production

India

Subscribe our newsletter

New Things Will Always Update Regularly