Deepgram

Model Evaluation QA Lead

Deepgram

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $180,000 - $230,000 per year

Job Level

About the role

  • Model Evaluation Automation: Design, build, and maintain automated model evaluation pipelines that run against every candidate model before release. Implement objective and subjective quality metrics (WER, SER, MOS, latency/throughput) across STT, TTS, and STS product lines.
  • Release Gate Integration: Embed model quality checkpoints into CI/CD and release pipelines. Define pass/fail criteria, build dashboards for model comparison, and own the go/no-go signal for model promotions to production.
  • Agent & Model Evaluation Frameworks: Stand up and operate evaluation tooling (Coval, Braintrust, Blue Jay, custom harnesses) for end-to-end voice agent testing—covering accuracy, latency, turn-taking, and conversational quality and custom metrics across real-world scenarios.
  • Active Learning & Data Ingestion Testing: Partner with the Active Learning team to validate data ingestion infrastructure, annotation pipelines, and retraining automation. Ensure data quality standards are met at every stage of the flywheel.
  • Industry Benchmark Automation: Automate execution and reporting of industry-standard benchmarks (e.g., LibriSpeech, CommonVoice, internal production-traffic evals). Maintain reproducible benchmark environments and publish results for internal consumption.
  • Language & Domain Validation: Build and maintain test suites for multi-language and domain-specific model validation. Design coverage matrices that ensure new languages and acoustic domains are systematically evaluated before GA.
  • Retraining Automation Support: Validate the end-to-end retraining pipeline across all data sources—from data selection and preprocessing through training, evaluation, and promotion—ensuring automation reliability and correctness.
  • Manual Test Feedback Loop: Design and operate human-in-the-loop evaluation workflows for subjective quality assessment. Build the tooling and processes that translate human feedback into actionable quality signals for the ML team.

Requirements

  • 4–7 years of experience in QA engineering, ML evaluation, or a related technical role with a focus on predictive and generative model and data quality.
  • Hands-on experience building automated test/evaluation pipelines for ML models and connecting software features.
  • Strong programming skills in Python; experience with ML evaluation libraries, data processing frameworks (Pandas, NumPy), and scripting for pipeline automation.
  • Familiarity with speech/audio ML concepts: WER, SER, MOS, acoustic models, language models, or similar evaluation metrics.
  • Experience with CI/CD integration for ML workflows (e.g., GitHub Actions, Jenkins, Argo, MLflow, or equivalent).
  • Ability to design and maintain reproducible benchmark environments across multiple model versions and configurations.
  • Strong communication skills—you can translate model quality metrics into actionable insights for engineering, research, and product stakeholders.
  • Detail-oriented and systematic, with a bias toward automation over manual process.
Benefits
  • Medical, dental, vision benefits
  • Annual wellness stipend
  • Mental health support
  • Life, STD, LTD Income Insurance Plans
  • Unlimited PTO
  • Generous paid parental leave
  • Flexible schedule
  • 12 Paid US company holidays
  • Quarterly personal productivity stipend
  • One-time stipend for home office upgrades
  • 401(k) plan with company match
  • Tax Savings Programs
  • Learning / Education stipend
  • Participation in talks and conferences
  • Employee Resource Groups
  • AI enablement workshops / sessions
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
QA engineeringML evaluationautomated test pipelinesPythonPandasNumPyCI/CD integrationmodel quality metricsbenchmark environmentsretraining automation
Soft Skills
strong communicationdetail-orientedsystematicbias toward automation