Train, fine-tune, and optimize speech and audio foundation models, including large-scale and domain-adapted models.
Contribute to the architecture of Artisight’s AI platform, ensuring seamless integration of audio intelligence with other AI modalities.
Collaborate closely with AI scientists, engineers, and product teams to translate research ideas into production-ready solutions.
Deploy Audio AI technologies into production healthcare environments and ensure low-latency performance where required.
Stay up to date with advances in speech processing, generative audio, and multimodal AI and incorporate insights into the applied research pipeline.
Share research outcomes through internal discussions, technical reports, and potentially external publications.
Requirements
M.S. or Ph.D. in computer science, electrical engineering, applied AI, machine learning, or related discipline.
Demonstrated expertise in speech and audio AI research, evidenced by open-source contributions or peer-reviewed publications (e.g., ICASSP, INTERSPEECH, NeurIPS, ICML).
Hands-on experience with one or more of: Automatic Speech Recognition (ASR), Text-to-Speech (TTS) and voice synthesis, Audio-to-Audio / Speech-to-Speech Generation, Audio classification and event detection, Voice Activity Detection (VAD).
Proficiency in deep learning techniques such as transformers, diffusion models, self-supervised learning, and sequence-to-sequence architectures.
Strong coding and experimentation skills with frameworks such as PyTorch or TensorFlow.
Experience with large-scale training and deployment tools (NVIDIA Triton, ONNX, or similar).
A collaborative mindset and ability to communicate research findings clearly to both technical and non-technical audiences.
Nice to haves: Experience with multimodal learning (audio + vision + text); Familiarity with federated learning and privacy-preserving AI approaches; Experience deploying real-time or low-latency audio AI models; Contributions to open-source audio AI projects (e.g., ESPnet, Kaldi, Hugging Face Transformers, Fairseq, Whisper).