Salary
💰 $140,000 - $170,000 per year
Tech Stack
AnsibleAWSCloudDockerElasticSearchElixirFFmpegKubernetesNode.jsNumpyPandasPythonReactTerraformTypeScript
About the role
- Lead efforts to improve transcription quality by evaluating, testing, and fine-tuning ASR models (both commercial APIs and open-source).
- Build pipelines that handle speaker identification, diarization, multi-language support, and noise-robust transcription in difficult audio environments.
- Develop and maintain services that integrate multiple ASR providers, ensuring resilience and flexibility across transcription workflows.
- Collaborate with platform engineers to ensure seamless ingestion and persistence of transcription outputs in data pipelines.
- Use data wrangling and exploratory analysis to deeply understand transcription accuracy and error patterns. - Explore and apply audio engineering techniques (denoising, voice isolation, codecs, signal processing) to improve speech clarity.
- Deploy and maintain transcription-related services with basic DevOps practices, ensuring scalability and reliability.
- Participate in all stages of the development lifecycle: ideation, design, prototyping, implementation, deployment, and iteration.
Requirements
- This is a remote, WFH role.
- Strong software engineering background in fields such as Computer Science, Software Engineering, or related disciplines.
- 5+ years of professional development experience, with significant focus on speech processing, NLP, or transcription systems.
- Proficiency in Python and comfort with system-level programming when needed.
- Experience with ASR frameworks (e.g., Whisper, Kaldi, Vosk, NVIDIA NeMo, or similar).
- Familiarity with audio engineering tools (e.g., ffmpeg, Sox) and denoising/voice enhancement techniques.
- Knowledge of speaker diarization, speaker recognition, and multi-language ASR challenges.
- Experience with data analysis and wrangling (e.g., Pandas, NumPy, Jupyter) to evaluate model performance.
- Understanding of cloud deployment and DevOps basics (e.g., Docker, Kubernetes, serverless workloads).
- Comfort working in a fast-paced environment with dynamic objectives and quick iteration cycles.
- Demonstrated ability to work independently, make tradeoffs, and deliver results with minimal supervision.
- Bonus Points: Hands-on experience fine-tuning ASR models on domain-specific datasets.
- Bonus Points: Familiarity with real-time streaming pipelines for audio ingestion and transcription.
- Bonus Points: Exposure to search and retrieval systems (e.g., Elasticsearch) for indexing transcribed text.
- Bonus Points: Prior experience in audio forensics or noisy-channel speech analysis.
- Bonus Points: Experience with applying heuristics to improve transcription results.