AI Research Scientist – Multimodal Post-Training

Sword Health

AI Research Scientist focused on multimodal AI research and applications in healthcare at Sword Health. Building cutting-edge AI solutions to enhance patient understanding and care.

Posted 6/18/2026full-time🇺🇸 United StatesMid-LevelSenior💰 €71,000 - €110,000 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

multimodal model trainingfine-tuningalignmentpost-training methodsSFTRLHFdataset curationarchitecture designcross-modal training strategiesmodel evaluation

Soft Skills

collaborationcommunicationresearchproblem-solvingexperiment designinterpretation of results

Tools & Technologies

PythonPyTorchJAX

Certifications & Qualifications

PhD in Computer SciencePhD in Machine LearningPhD in Natural Language ProcessingPhD in Computer Vision

Industry Keywords

vision-language modelsspeech-language modelsclinical domainsAI agentsreal-time patient state estimationclinical memorysafety validationpeer-reviewed researchAI conferencesAI journals

Tech Stack

Tools & technologies

PythonPyTorch

About the role

Key responsibilities & impact

Design and execute research on multimodal model training — with a primary focus on vision-language models and, increasingly, speech-language models — including fine-tuning, alignment, and post-training methods (SFT, RLHF) tailored for clinical domains;
Develop and improve models that enable our AI agents to perceive and understand patients through video, language, and speech, building towards unified multimodal patient understanding;
Contribute to the full model development cycle: multimodal dataset curation and annotation, architecture design, cross-modal training strategies, evaluation, and iteration;
Collaborate across AI Engineering, Product, and Clinical teams to translate multimodal research breakthroughs into production systems that deliver patient care;
Work towards long-term ambitious research goals — such as real-time multimodal patient state estimation, clinical memory, and safety validation — while identifying and delivering immediate milestones;
Advance the field by publishing in top-tier AI venues and clinical journals, contributing to Sword's growing body of peer-reviewed research.

Requirements

What you’ll need

A PhD in Computer Science, Machine Learning, Natural Language Processing, Computer Vision, or a closely related AI field;
Hands-on experience fine-tuning large language models or multimodal large models (e.g., vision-language models, speech-language models), including pre-training, SFT, RLHF, or related post-training techniques;
Experience training or fine-tuning models that operate across multiple modalities (e.g., video + language, image + text, speech + text);
A strong publication track record in peer-reviewed AI conferences or journals;
Proficiency in Python and deep experience with modern ML frameworks (e.g., PyTorch, JAX);
Demonstrated ability to design rigorous experiments and interpret their results.

Benefits

Comp & perks

Health, dental and vision insurance
Meal allowance
Equity shares
Remote work allowance
Flexible working hours
Work from home
Discretionary vacation
Snacks and beverages