Senior Applied Research Engineer - Model Compression

ora computing Austria
Remote Visa Sponsorship Relocation
Apply
AI Summary

Develop new compression methods for large language models, improve and extend structural pruning algorithm, and work on model retraining pipeline. Requires PhD in computer science or equivalent, published work on quantization, pruning, or LLM training, and production-grade Python code.

Key Highlights
Develop new compression methods for large language models
Improve and extend structural pruning algorithm
Work on model retraining pipeline
Combine pruning with quantization
Compress customer models for cloud and edge deployment
Key Responsibilities
Develop new compression methods for large language models
Improve and extend structural pruning algorithm
Work on model retraining pipeline
Combine pruning with quantization
Compress customer models for cloud and edge deployment
Technical Skills Required
Python Quantization Pruning Large Language Models GPU Benchmarks MoE architectures Multimodal models Kernel optimization
Benefits & Perks
€70-120k base + equity
Austrian minimum disclosed per Kollektivvertrag: €43,456/year
Sponsor visas and support relocation
Hybrid or fully remote work
English as working language
Nice to Have
Open-source contributions to ML infrastructure
Experience with MoE architectures or multimodal models
Background in kernel optimization

Job Description


Ora Computing · Vienna · Full-time


We compress large language models (LLMs). Our information-theoretic structural pruning and quantization algorithm shrinks model footprints by over 80% without retraining, in hours rather than weeks. 


The role


This is an applied research role. You'll develop new compression methods and ship them, not write papers about them. The cycle is short: read the literature, prototype, benchmark on real models, integrate into our pipeline, iterate with customers running compressed models in production.

You'll own significant technical scope from day one. Expect to work across the stack: pruning algorithms, quantization, evaluation infrastructure, and the production code that customers actually use.


What you'll work on


  • Improving and extending our structural pruning algorithm to new architectures (MoE, multimodal, vision-language)
  • Combining pruning with quantization (NVFP4/FP8/INT4, sub-4 bit mixed precision) in our compression pipeline
  • Expanding and improving our model retraining pipeline (SFT, GKD, DPO, GRPO) 
  • Compressing customer models (Llama, Qwen, Gemma, and proprietary fine-tunes) for cloud and edge deployment
  • Hardware-aware optimization for different accelerator targets (A100/H100/B300 and edge hardware)


What we're looking for


  • PhD in computer science, machine learning, or equivalent
  • Published work on quantization, pruning, or LLM training
  • Production-grade Python code (not just Jupyter notebooks). You write code others can read and run
  • Experience taking a method from paper to a working system on real models
  • Comfort in working with LLMs, GPUs, and evaluating benchmarks 
  • You ship. You finish things.


Bonus


  • Open-source contributions to ML infrastructure (vLLM, llama.cpp, transformers, TensorRT-LLM, bitsandbytes, GPTQ/AWQ implementations)
  • Experience with MoE architectures or multimodal models (Qwen Omni)
  • Background in kernel optimization


Practical


  • Vienna-based. Hybrid or fully remote 
  • Working language is English
  • We sponsor visas and support relocation
  • Compensation: €70-120k base + equity. Austrian minimum disclosed per Kollektivvertrag: €43,456/year
  • We don't require writing publications, but we support presenting work at venues when it fits the company and the project


How to apply


Send CV, list of representative papers, and other relevant info to info@oracomputing.com. Tell us in two paragraphs what you'd want to work on at Ora and why. We respond within a week.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

fonio.ai

Austria
Visa Sponsorship Relocation Remote
Job Type Contract
Experience Level Not Applicable

fiskaly

Austria
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

fiskaly

Austria

Subscribe our newsletter

New Things Will Always Update Regularly