Senior Applied Research Engineer - Model Compression

ora computing • Austria

Remote Visa Sponsorship Relocation

Apply

AI Summary

Develop new compression methods for large language models, improve and extend structural pruning algorithm, and work on model retraining pipeline. Requires PhD in computer science or equivalent, published work on quantization, pruning, or LLM training, and production-grade Python code.

Key Highlights

Develop new compression methods for large language models

Improve and extend structural pruning algorithm

Work on model retraining pipeline

Combine pruning with quantization

Compress customer models for cloud and edge deployment

Key Responsibilities

Develop new compression methods for large language models

Improve and extend structural pruning algorithm

Work on model retraining pipeline

Combine pruning with quantization

Compress customer models for cloud and edge deployment

Technical Skills Required

Python Quantization Pruning Large Language Models GPU Benchmarks MoE architectures Multimodal models Kernel optimization

Benefits & Perks

€70-120k base + equity

Austrian minimum disclosed per Kollektivvertrag: €43,456/year

Sponsor visas and support relocation

Hybrid or fully remote work

English as working language

Nice to Have

Open-source contributions to ML infrastructure

Experience with MoE architectures or multimodal models

Background in kernel optimization

Job Description

Ora Computing · Vienna · Full-time

We compress large language models (LLMs). Our information-theoretic structural pruning and quantization algorithm shrinks model footprints by over 80% without retraining, in hours rather than weeks.

The role

This is an applied research role. You'll develop new compression methods and ship them, not write papers about them. The cycle is short: read the literature, prototype, benchmark on real models, integrate into our pipeline, iterate with customers running compressed models in production.

You'll own significant technical scope from day one. Expect to work across the stack: pruning algorithms, quantization, evaluation infrastructure, and the production code that customers actually use.

What you'll work on

Improving and extending our structural pruning algorithm to new architectures (MoE, multimodal, vision-language)
Combining pruning with quantization (NVFP4/FP8/INT4, sub-4 bit mixed precision) in our compression pipeline
Expanding and improving our model retraining pipeline (SFT, GKD, DPO, GRPO)
Compressing customer models (Llama, Qwen, Gemma, and proprietary fine-tunes) for cloud and edge deployment
Hardware-aware optimization for different accelerator targets (A100/H100/B300 and edge hardware)

Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.

What we're looking for

PhD in computer science, machine learning, or equivalent
Published work on quantization, pruning, or LLM training
Production-grade Python code (not just Jupyter notebooks). You write code others can read and run
Experience taking a method from paper to a working system on real models
Comfort in working with LLMs, GPUs, and evaluating benchmarks
You ship. You finish things.

Bonus

Open-source contributions to ML infrastructure (vLLM, llama.cpp, transformers, TensorRT-LLM, bitsandbytes, GPTQ/AWQ implementations)
Experience with MoE architectures or multimodal models (Qwen Omni)

Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.

Background in kernel optimization

Practical

Vienna-based. Hybrid or fully remote
Working language is English
We sponsor visas and support relocation
Compensation: €70-120k base + equity. Austrian minimum disclosed per Kollektivvertrag: €43,456/year
We don't require writing publications, but we support presenting work at venues when it fits the company and the project

How to apply

Send CV, list of representative papers, and other relevant info to info@oracomputing.com. Tell us in two paragraphs what you'd want to work on at Ora and why. We respond within a week.

Job Overview

Posted Date Jun 09, 2026

Employment Type Full-time

Experience Level Mid-Senior level

Location Austria

Annual Salary 0 - 0 EUR

Category Programming

Company ora computing

Mentioned Skills

Similar Jobs

Explore other opportunities that match your interests

International Paid Campaigns Manager - AI Voice Agents

Programming

•

3h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

fonio.ai

Austria

Agentic Backend Engineer (Golang)

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Contract

Experience Level Not Applicable

fiskaly

Austria

Staff Backend Engineer (Golang) - Technical Architect & AI Strategy Lead

Programming

•

5h ago

Visa Sponsorship Relocation Remote

Job Type Full-time

Experience Level Not Applicable

fiskaly

Austria

Senior Applied Research Engineer - Model Compression

Key Highlights

Key Responsibilities

Technical Skills Required

Benefits & Perks

Nice to Have

Job Description

Job Overview

Mentioned Skills

Industries

Similar Jobs

International Paid Campaigns Manager - AI Voice Agents

fonio.ai

Agentic Backend Engineer (Golang)

fiskaly

Staff Backend Engineer (Golang) - Technical Architect & AI Strategy Lead

fiskaly

Subscribe our newsletter