Architect and validate rigorous Continual Pre-Training (CPT) frameworks, focusing on domain adaptation techniques that effectively transfer Reddit’s knowledge into licensed frontier models.
Design the "Science of Multimodality": Lead research into fusing vision and language encoders to process Reddit’s rich media (images, video) alongside conversational text threads.
Formulate data curriculum strategies: scientifically determining the optimal ratio of "Reddit data" vs. "General data" to maximize community understanding while maintaining safety and reasoning capabilities.
Conduct deep-dive research into Scaling Laws for Graph-based data: investigating how Reddit’s tree-structured conversations impact model convergence compared to flat text.
Design and scale continuous evaluation pipelines (the "Reddit Gym") that monitor model reasoning and safety capabilities in real-time, enabling dynamic adjustments to training recipes.
Drive high-stakes architectural decisions regarding compute allocation, distributed training strategies (3D parallelism), and checkpointing mechanisms on AWS Trainium/Nova clusters.
Serve as a force multiplier for the engineering team by setting coding standards, conducting high-level design reviews, and mentoring senior engineers on distributed systems and ML fundamentals.

Requirements

7+ years of experience in Machine Learning engineering or research, with a specific focus on LLM Pre-training, Domain Adaptation, or Transfer Learning.
Expert-level proficiency in Python and deep learning frameworks (PyTorch or JAX), with a track record of debugging complex training instabilities at scale.
Deep theoretical understanding of Transformer architectures and Pre-training dynamics—specifically regarding Catastrophic Forgetting and Knowledge Injection.
Experience with Multimodal models (VLM): understanding how to align image/video encoders (e.g., CLIP, SigLIP) with language decoders.
Experience implementing continuous integration/evaluation systems for ML models, measuring generalization and reasoning performance.
Demonstrated ability to communicate complex technical concepts (like loss spikes or convergence issues) to leadership and coordinate efforts across Infrastructure and Data teams.

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Machine Learning engineeringLLM Pre-trainingDomain AdaptationTransfer LearningPythonPyTorchJAXTransformer architecturesMultimodal modelsContinuous integration

Soft Skills

communicationmentoringcollaborationleadership