FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Lead Machine Learning Engineer, LLM Infrastructure
Salesforce. Design, build, and maintain infrastructure for LLM post-training, evaluation, and deployment.
Posted 4/28/2026full-timeSan Francisco • California • 🇺🇸 United StatesSenior💰 $172,500 - $260,100 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudDockerGoogle Cloud PlatformKubernetesPython
About the role
Key responsibilities & impact- Design, build, and maintain infrastructure for LLM post-training, evaluation, and deployment.
- Own scalable pipelines for training orchestration, rollout generation, reward and feedback processing, checkpointing, and experiment management.
- Build reliable systems for feedback-driven model improvement, including human or AI feedback loops, large-scale offline evaluation, and regression detection.
- Partner closely with research scientists to turn new post-training methods into reusable engineering workflows.
- Collaborate with agent engineers and platform teams to integrate training and evaluation systems with production model and agent stacks.
- Optimize distributed training and inference workloads for reliability, throughput, cost efficiency, and observability.
- Drive best practices for reproducibility, versioning, monitoring, deployment, and operational excellence across ML systems.
Requirements
What you’ll need- 5+ years of experience in software engineering, ML systems, or distributed infrastructure.
- Strong proficiency in Python and experience building production systems or large-scale ML pipelines.
- Hands-on experience building infrastructure for model training, post-training, evaluation, or serving.
- Experience designing reliable, scalable systems for distributed and GPU-based workloads.
- Strong debugging skills across systems, pipelines, and model-facing failures.
- Experience building infrastructure for LLM post-training, including RLHF, preference optimization, reward modeling, or related feedback-driven training workflows.
- Experience working cross-functionally with research scientists and engineers.
- Familiarity with cloud platforms (AWS, GCP) and containerized environments (Docker, Kubernetes).
Benefits
Comp & perks- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
- Professional development opportunities
- Employee stock purchasing program
- Mental health support
- Life and disability insurance
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Pythondistributed infrastructureML systemslarge-scale ML pipelinesreliable systems designdebuggingLLM post-trainingRLHFpreference optimizationreward modeling
Soft Skills
collaborationcross-functional teamworkproblem-solving