FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal Software Engineer
DataRobotPrincipal Software Engineer at DataRobot responsible for optimizing AI infrastructure and leading technical teams. Design and develop scalable solutions for large language model serving systems with collaborative cross-functional teamwork.
Posted 5/19/2026full-timeBoston • California, Massachusetts, Washington • 🇺🇸 United StatesLeadWebsite
Tech Stack
Tools & technologiesAWSAzureCloudGoGoogle Cloud PlatformKubernetesPythonTerraform
About the role
Key responsibilities & impact- Help design, develop, and optimize the inference engine that powers DataRobot's agentic infrastructure API, ensuring large language model (LLM) serving systems are fast, scalable, and efficient.
- Contribute to the design and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference.
- Collaborate with partners such as NVIDIA to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine.
- Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs, and accelerators.
- Build and maintain instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations.
- Develop and enhance scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads.
- Integrate with federated, distributed inference infrastructure – orchestrate across nodes, balance load, handle communication overhead.
- Collaborate cross-functionally: with platform engineers, cloud infrastructure, and security/compliance teams.
- Document and share learnings, contributing to internal best practices and open-source efforts when possible.
Requirements
What you’ll need- 10+ years of engineering experience, with at least 5+ in infrastructure, platform, or backend systems roles.
- Deep expertise in Kubernetes internals and operations, including networking, scheduling, scaling, and controller patterns.
- Proven ability to design and build systems from scratch, making pragmatic tradeoffs along the way.
- Strong proficiency in modern programming languages such as Python or Go.
- Experience building production-quality, reliable, and observable systems that are used across engineering organizations.
- A growth-oriented mindset—driven to teach, learn, and improve systems as well as people.
- Experience operating across multiple cloud providers (AWS, GCP, Azure) and/or hybrid environments.
- Strong experience with Helm, container orchestration patterns, and CI/CD automation.
- Comfortable working with IaC (Terraform, Pulumi) and GitOps workflows.
- Ability to influence without authority and align diverse stakeholders around technical decisions.
Benefits
Comp & perks- Medical, Dental & Vision Insurance
- Flexible Time Off Program
- Paid Holidays
- Paid Parental Leave
- Global Employee Assistance Program (EAP) and more!
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
inference engine developmentlarge language models (LLM)KubernetesPythonGoCI/CD automationTerraformPulumirouting mechanismsmemory management
Soft Skills
collaborationgrowth-oriented mindsetinfluence without authoritydocumentationcross-functional teamwork