
Software Engineer – Machine Learning Infrastructure
Salesforce
full-time
Posted on:
Location Type: Hybrid
Location: Seattle • Texas • Washington • United States
Visit company websiteExplore more
Salary
💰 $164,000 - $313,700 per year
About the role
- Design, build, and operate systems to train, serve, and deploy machine learning models at scale, with a focus on reliability, performance, and operational simplicity
- Evolve GPU backed inference infrastructure to support high throughput, latency sensitive workloads, including large scale model serving
- Architect and optimize distributed training and data processing systems using platforms such as Ray, Airflow, Spark, or similar technologies
- Build and maintain Kubernetes based platforms and orchestration layers using tools such as KubeRay, vLLM, and internally developed services
- Architect solutions that bridge legacy systems with modern technologies while maintaining monolithic application stability
- Develop robust monitoring, observability, and alerting for production ML workloads to ensure operational excellence
- Partner closely with AI Platform, ML modeling, security, and product engineering teams to design infrastructure that supports evolving AI use cases
- Provide technical leadership through design reviews, mentorship, and by setting engineering standards and long term architectural direction for ML infrastructure
- Author technical design and architecture documentation, and contribute thought leadership through engineering blog posts
Requirements
- Significant professional experience in software engineering with a strong focus on infrastructure, backend systems, platform engineering, or MLOps
- Deep experience building and operating distributed systems, including expert level knowledge of Kubernetes and container based platforms
- Hands on experience with modern ML infrastructure and serving stacks such as Ray or KubeRay, vLLM, or similar training and inference orchestration frameworks
- Experience working with GPU infrastructure, including performance optimization and operational management at scale
- Strong experience with data infrastructure and orchestration technologies such as Airflow, Spark, or similar systems
- Experience building and operating cloud native systems on public cloud platforms such as AWS, GCP, or Azure, including infrastructure as code
- Excellent written communication
- A related technical degree required
Benefits
- time off programs
- medical, dental, vision
- mental health support
- paid parental leave
- life and disability insurance
- 401(k)
- employee stock purchasing program
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learningdistributed systemsinfrastructurebackend systemsplatform engineeringMLOpsperformance optimizationdata processinginfrastructure as codetechnical design
Soft Skills
technical leadershipmentorshipwritten communication
Certifications
related technical degree