Senior Speech-to-Text Machine Learning Engineer (Whisper Focus)

Jobs via Dice United State
Remote
This Job is No Longer Active This position is no longer accepting applications
AI Summary

Develop and fine-tune Whisper speech-to-text models for multilingual datasets. Collaborate with engineering teams to integrate models into production workflows. Apply broader ML techniques to enhance speech recognition and translation pipelines.

Key Highlights
Fine-tune and optimize Whisper for specific use cases and languages
Analyze model performance and identify areas for improvement
Collaborate with engineering teams to integrate Whisper into production workflows
Apply broader ML techniques to enhance speech recognition and translation pipelines
Technical Skills Required
Python PyTorch Hugging Face CUDA Whisper OpenAI ASR LM fusion tokenization/normalization data curation SoX FFmpeg NeMo Triton Torch TensorRT LoRA PEFT
Benefits & Perks
100% remote work
Contractor position (6+ months)
Opportunity to work with Whisper and OpenAI's speech-to-text model

Job Description


Dice is the leading career destination for tech experts at every stage of their careers. Our client, Rocket, is seeking the following. Apply via Dice today!

Job Title: Speech to Text ML Engineer (Whisper Focus)

Location: 100% Remote

Duration: 6+ Months

Notes:

AI/ML Engineer (Contractor)

Mission: To work on speech-to-text finetuning and inference in multiple languages.

Responsibilities

Train and fine-tune with speech-to-text(ASR) models like Whisper.

Implement training/eval loops (PyTorch + Transformers).

Add data augmentations (noise/reverb/law), VAD, diarization (e.g., Whisper).

Package inference (ONNX / Tensor if needed), expose REST/gRPC, batch/streaming modes.

Optimize RTF (realtime factor), memory

Design training curricula, LR schedules, frozen vs full finetune, LoRA, data mixing.

Build robust test suites: clean/noisy/telephony, accents, domain terminology.

Analyze error types (sub/del/ins), OOV handling, bias; propose targeted fixes.

Experience in ASR, LM fusion, tokenization/normalization, data curation, PyTorch, Hugging Face, CUDA; experience finetuning seq2seq/CTC/RNNT; SoX/FFmpeg; metrics (WER/CER, latency), NeMo, Triton, Torch/TensorRT, LoRA/PEFT.

Overview:

LanguageLine Solutions is seeking a highly specialized Machine Learning Engineer with hands-on experience working with Whisper, OpenAI s speech-to-text model. This role will focus on fine-tuning and training Whisper to improve transcription accuracy and performance across multilingual datasets, supporting LanguageLine s mission to deliver world-class translation and interpretation services.

Key Responsibilities:

  • Fine-tune and optimize Whisper for specific use cases and languages
  • Analyze model performance and identify areas for improvement
  • Collaborate with engineering teams to integrate Whisper into production workflows
  • Apply broader ML techniques to enhance speech recognition and translation pipelines
  • Recommend tools, frameworks, and best practices for scalable deployment

Qualifications:

  • Proven experience with Whisper or similar speech-to-text models
  • Strong background in machine learning, deep learning, and NLP
  • Familiarity with audio processing and multilingual datasets
  • Ability to work independently and communicate findings clearly
  • Experience with Python, PyTorch, and ML frameworks preferred

Subscribe our newsletter

New Things Will Always Update Regularly