Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
DataRobot

Principal Software Engineer

DataRobot

Principal Software Engineer at DataRobot responsible for optimizing AI infrastructure and leading technical teams. Design and develop scalable solutions for large language model serving systems with collaborative cross-functional teamwork.

Posted 5/19/2026full-timeBoston • California, Massachusetts, Washington • 🇺🇸 United StatesLeadWebsite

Tech Stack

Tools & technologies
AWSAzureCloudGoGoogle Cloud PlatformKubernetesPythonTerraform

About the role

Key responsibilities & impact
  • Help design, develop, and optimize the inference engine that powers DataRobot's agentic infrastructure API, ensuring large language model (LLM) serving systems are fast, scalable, and efficient.
  • Contribute to the design and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference.
  • Collaborate with partners such as NVIDIA to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine.
  • Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs, and accelerators.
  • Build and maintain instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations.
  • Develop and enhance scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads.
  • Integrate with federated, distributed inference infrastructure – orchestrate across nodes, balance load, handle communication overhead.
  • Collaborate cross-functionally: with platform engineers, cloud infrastructure, and security/compliance teams.
  • Document and share learnings, contributing to internal best practices and open-source efforts when possible.

Requirements

What you’ll need
  • 10+ years of engineering experience, with at least 5+ in infrastructure, platform, or backend systems roles.
  • Deep expertise in Kubernetes internals and operations, including networking, scheduling, scaling, and controller patterns.
  • Proven ability to design and build systems from scratch, making pragmatic tradeoffs along the way.
  • Strong proficiency in modern programming languages such as Python or Go.
  • Experience building production-quality, reliable, and observable systems that are used across engineering organizations.
  • A growth-oriented mindset—driven to teach, learn, and improve systems as well as people.
  • Experience operating across multiple cloud providers (AWS, GCP, Azure) and/or hybrid environments.
  • Strong experience with Helm, container orchestration patterns, and CI/CD automation.
  • Comfortable working with IaC (Terraform, Pulumi) and GitOps workflows.
  • Ability to influence without authority and align diverse stakeholders around technical decisions.

Benefits

Comp & perks
  • Medical, Dental & Vision Insurance
  • Flexible Time Off Program
  • Paid Holidays
  • Paid Parental Leave
  • Global Employee Assistance Program (EAP) and more!

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
inference engine developmentlarge language models (LLM)KubernetesPythonGoCI/CD automationTerraformPulumirouting mechanismsmemory management
Soft Skills
collaborationgrowth-oriented mindsetinfluence without authoritydocumentationcross-functional teamwork