Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Reddit, Inc.

Staff Machine Learning Engineer, AI Serving

Reddit, Inc.

Staff Machine Learning Engineer developing large-scale ML Inference Platform at Reddit. Leading design and maintenance of GPU-based model serving system while collaborating across teams.

Posted 5/4/2026full-timeRemote • 🇺🇸 United StatesLead💰 $253,300 - $354,600 per yearWebsite

Tech Stack

Tools & technologies
AWSCloudGoKubernetesPythonPyTorchTerraform

About the role

Key responsibilities & impact
  • Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting Millions of QPS.
  • Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale.
  • Rapidly develop prototypes and develop a high-performance feature hydration and processing system as a part of the inference stack - including routing, caching, and batching.
  • Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models.
  • Strong understanding of real-time ML observability to track feature/model performance.
  • Experience working with LLM serving online at scale.
  • Built an E2E inference performance benchmarking framework
  • Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases.

Requirements

What you’ll need
  • 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
  • Have experience operating orchestration systems such as Kubernetes at scale
  • Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
  • Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
  • Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
  • Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus
  • Strong proficiency in Python and deep experience with modern AI/ML frameworks (Triton, Dynamo, vLLM, Pytorch)

Benefits

Comp & perks
  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k with Employer Match
  • Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
ML EngineeringAI Platform EngineeringCloud AI DeploymentKubernetesAWSGoogle Cloud StorageTerraformPythonGoTriton
Soft Skills
communication skillsarticulate technical conceptsfocus on scalabilityfocus on reliabilityfocus on performanceadvocate for platform usersintuition for product development lifecycle