The Post-Training team focuses on adapting foundation models to real-world performance and alignment requirements.
Researchers develop and evaluate techniques such as supervised fine-tuning, preference optimization (DPO, RLHF, RLAIF), and continual adaptation to align models with Distyl’s enterprise systems.
Researchers in Post-Training investigate new methods for aligning large models with human and system-level objectives.
They explore trade-offs between generalization and specialization, data efficiency and robustness, capability and controllability.
Requirements
Deep Understanding of Post-training Techniques: Familiarity with supervised fine-tuning, preference optimization (RLHF/DPO), LoRA/PEFT, and instruction-tuning pipelines.
Experience Adapting Frontier Models: You’ve tuned or adapted LLMs/SLMs to specialized domains or behaviors through data curation, reward modeling, or continual pretraining.
Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them.
Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
Uses AI Every Day: You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
Strong Programming and Data Analysis Skills: You need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.
Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.
Benefits
Competitive salary and benefits package, including equity options
Medical/dental/vision covered at 100% for you and your dependents
401K plan
Commuter benefits
Lunch provided in office
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.