Liquid AI

Technical Staff Member – ML Research Engineer, Multi-Modal – Audio

Liquid AI

full-time

Posted on:

Location Type: Hybrid

Location: San Francisco • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

PythonPyTorch

About the role

  • Invent and prototype new model architectures that optimize inference speed, including on edge devices
  • Build and maintain evaluation suites for multimodal performance across a range of public and internal tasks
  • Collaborate with the data and infrastructure teams to build scalable pipelines for ingesting and preprocessing large audio datasets
  • Work with the infrastructure team to optimize model training across large-scale GPU clusters
  • Contribute to publications, internal research documents, and thought leadership within the team and the broader ML community
  • Collaborate with the applied research and business teams on client-specific use cases

Requirements

  • You have experience with machine learning at scale
  • You have worked with audio models and understand the effects of architecture choices on runtime, latency, and quality
  • You’re proficient in PyTorch, and familiar with distributed training frameworks like DeepSpeed, FSDP, or Megatron-LM
  • You’ve worked with multimodal data (e.g. audio, text, image, video)
  • You’ve contributed to research papers, open-source projects, or production-grade multimodal model systems
  • You understand how data quality, augmentations, and preprocessing pipelines can significantly impact model performance—and you’ve built tooling to support that
  • You enjoy working in interdisciplinary teams across research, systems, and infrastructure, and can translate ideas into high-impact implementations
  • You’ve designed and trained multimodal language models, or specialized audio models (e.g. ASR, TTS, voice conversion, vocoders, diarization)
  • You care deeply about empirical performance, and know how to design, run, and debug large-scale training experiments on distributed GPU clusters
  • You’ve developed audio encoders or decoders, or integrated them into language pretraining pipelines with autoregressive or generative objectives
  • You have experience working with large-scale audio datasets, understand the unique challenges they pose, and can manage massive datasets effectively
  • You have strong programming skills in Python, with an emphasis on writing clean, maintainable, and scalable code.
Benefits
  • A front-row seat in building some of the most capable Speech Language Models
  • Access to world-class infrastructure, a fast-moving research team, and deep collaboration across ML, systems, and product
  • The opportunity to shape multimodal foundation model research with both scientific rigor and real-world impact

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
machine learningPyTorchdistributed trainingDeepSpeedFSDPMegatron-LMmultimodal dataaudio modelsPythonmodel training
Soft skills
collaborationinterdisciplinary teamworkcommunicationproblem-solvingthought leadership