
Technical Staff Member – ML Research Engineer, Multi-Modal – Audio
Liquid AI
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • 🇺🇸 United States
Visit company websiteJob Level
Lead
Tech Stack
PythonPyTorch
About the role
- Invent and prototype new model architectures that optimize inference speed, including on edge devices
- Build and maintain evaluation suites for multimodal performance across a range of public and internal tasks
- Collaborate with the data and infrastructure teams to build scalable pipelines for ingesting and preprocessing large audio datasets
- Work with the infrastructure team to optimize model training across large-scale GPU clusters
- Contribute to publications, internal research documents, and thought leadership within the team and the broader ML community
- Collaborate with the applied research and business teams on client-specific use cases
Requirements
- You have experience with machine learning at scale
- You have worked with audio models and understand the effects of architecture choices on runtime, latency, and quality
- You’re proficient in PyTorch, and familiar with distributed training frameworks like DeepSpeed, FSDP, or Megatron-LM
- You’ve worked with multimodal data (e.g. audio, text, image, video)
- You’ve contributed to research papers, open-source projects, or production-grade multimodal model systems
- You understand how data quality, augmentations, and preprocessing pipelines can significantly impact model performance—and you’ve built tooling to support that
- You enjoy working in interdisciplinary teams across research, systems, and infrastructure, and can translate ideas into high-impact implementations
- You’ve designed and trained multimodal language models, or specialized audio models (e.g. ASR, TTS, voice conversion, vocoders, diarization)
- You care deeply about empirical performance, and know how to design, run, and debug large-scale training experiments on distributed GPU clusters
- You’ve developed audio encoders or decoders, or integrated them into language pretraining pipelines with autoregressive or generative objectives
- You have experience working with large-scale audio datasets, understand the unique challenges they pose, and can manage massive datasets effectively
- You have strong programming skills in Python, with an emphasis on writing clean, maintainable, and scalable code.
Benefits
- A front-row seat in building some of the most capable Speech Language Models
- Access to world-class infrastructure, a fast-moving research team, and deep collaboration across ML, systems, and product
- The opportunity to shape multimodal foundation model research with both scientific rigor and real-world impact
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
machine learningPyTorchdistributed trainingDeepSpeedFSDPMegatron-LMmultimodal dataaudio modelsPythonmodel training
Soft skills
collaborationinterdisciplinary teamworkcommunicationproblem-solvingthought leadership