AI Researcher / ML Engineer – ASR, Speech Specialist

LILT

Senior AI Researcher / Machine Learning Engineer specializing in Automatic Speech Recognition (ASR) for LILT, leading AI-driven communication technology.

Posted 6/5/2026full-timeWashington D.C. • Massachusetts, Washington • 🇺🇸 United StatesMid-LevelSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

ASR systemsspeech-to-text translationdeep learningalgorithm optimizationdata structuresmultilingual text/audio tokenizationautomated framework evaluationsWord Error Rate (WER)Character Error Rate (CER)audio processing models

Soft Skills

collaborationcommunicationproblem-solvingtechnical roadmap translationteamwork

Tools & Technologies

PyTorchWhisperNVIDIA NeMoHugging Face TransformersKaldiESPnetSpeechBrainExecuTorchLiteRTONNX Runtime Mobile

Certifications & Qualifications

Master’s degreePh.D. degree

Industry Keywords

speech processingmultilingual benchmarksdynamic vocabulary insertioncontextual biasinglanguage model personalizationreal-time streaming inferencehigh-efficiency asynchronous processingaudio augmentation techniquestext normalizationinverse text normalization

Tech Stack

Tools & technologies

PythonPyTorchTensorflow

About the role

Key responsibilities & impact

Architect, train, fine-tune, and evaluate state-of-the-art speech representations and ASR models (e.g., End-to-End Conformer, Whisper, RNN-T, and hybrid CTC/Attention architectures) across multiple global languages
Design and deploy highly scalable algorithms for dynamic vocabulary insertion, contextual biasing, and language model (LM) personalization to precisely capture customer-specific terminology, acronyms, and product names
Implement automated framework evaluations to benchmark model performance, rigorously tracking Word Error Rate (WER), Character Error Rate (CER), embedding-based metrics, latency budgets (RTF), and computing efficiency profiles under varying acoustic environments
Develop pioneering multilingual benchmarks for end-to-end conversational AI agents, including speech-to-text and text-to-speech components
Partner with core engineering teams to build, optimize, and maintain high-throughput pipelines optimized for both ultra-low latency real-time streaming inference and high-efficiency asynchronous (batch) multi-channel speech analysis
Translate product requirements into technical AI roadmaps, working hand-in-hand with Product Managers to ship speech-to-text, simultaneous translation, and semantic speech analytics features

Requirements

What you’ll need

Master’s or Ph.D. degree in Computer Science, Electrical Engineering, Computational Linguistics, Data Science, or related quantitative field with emphasis on speech processing or deep learning (or equivalent proven industry track record)
Minimum of 3–5 years of dedicated professional experience developing ASR systems, speech-to-text translation pipelines, or advanced audio processing models
Advanced proficiency with PyTorch or equivalent frameworks, along with extensive experience utilizing dedicated speech toolkits such as Whisper, NVIDIA NeMo, Hugging Face Transformers, Kaldi, ESPnet, or SpeechBrain
Hands-on experience converting and running PyTorch models on at least one mobile inference runtime: ExecuTorch, LiteRT (formerly TensorFlow Lite), or ONNX Runtime Mobile
Strong software engineering principles in Python, with clear understanding of data structures, algorithm optimization, and handling complex multilingual text/audio tokenization schemas
Proven experience working with large-scale audio datasets, audio augmentation techniques (e.g., SpecAugment, noise injection), and text normalization/inverse text normalization (ITN) pipelines

Benefits

Comp & perks

Health insurance
Flexible work arrangements
Professional development opportunities