Red Hat

Senior Principal Software Engineer, AI Inference

Red Hat

full-time

Posted on:

Location Type: Hybrid

Location: BostonMassachusettsNorth CarolinaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $189,600 - $312,730 per year

Job Level

About the role

  • Build and release vLLM wheels across multiple hardware backends and CPU architectures, managing complex native dependency chains including PyTorch, Triton, and other accelerator-specific libraries
  • Design and maintain CI/CD pipelines spanning multiple platforms including GitHub Actions, GitLab CI, and Buildkite for build, test, and release workflows
  • Manage and scale multi-cloud GPU infrastructure using Terraform and Ansible, including both bare-metal and Kubernetes-based compute runners
  • Own the model validation pipeline, orchestrating accuracy evaluation, performance benchmarking, tool-calling validation, and smoke testing across dozens of LLMs on both bare metal and OpenShift
  • Develop and maintain the Python tooling and automation that powers the build, packaging, validation, and release processes
  • Drive adoption of agentic AI and intelligent automation to streamline engineering workflows, accelerate debugging, and reduce toil across the team

Requirements

  • 8+ years of software engineering experience with significant depth in build systems, release engineering, or infrastructure
  • Strong Python development skills with experience building well-tested, maintainable tooling and automation
  • Hands-on experience building and packaging Python projects with native compiled extensions, including familiarity with C++ and CUDA build toolchains, wheel packaging, and multi-architecture builds
  • Deep familiarity with container ecosystems, including Dockerfiles and Containerfiles, image registries, and container build pipelines
  • Understanding of LLM evaluation methodology, including accuracy benchmarks such as MMLU, GSM8K, and HellaSwag, as well as inference performance metrics like throughput and latency
  • Experience with CI/CD platforms such as GitHub Actions, GitLab CI, Tekton, or Buildkite
  • Solid understanding of release engineering practices including reproducible builds, artifact management, dependency pinning, and security scanning
  • Experience with infrastructure-as-code tools such as Terraform and Ansible, and managing cloud resources at scale
  • Working knowledge of Kubernetes and/or OpenShift for deploying and testing workloads
  • Enthusiasm for applying LLM-based agents and AI-assisted tools to automate engineering workflows, with a track record of identifying repetitive processes and replacing them with intelligent automation
  • Excellent communication skills, capable of interacting effectively with both technical and non-technical team members.
  • A Bachelor's or Master's degree in computer science, computer engineering, or a related field. A Ph.D. in an ML-related domain is a significant advantage.
Benefits
  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonC++CUDACI/CDbuild systemsrelease engineeringinfrastructure-as-codecontainer ecosystemsLLM evaluation methodologyartifact management
Soft Skills
communicationteam collaborationproblem-solvingautomationdebuggingprocess improvementtechnical writinginterpersonal skillsleadershipadaptability
Certifications
Bachelor's degree in computer scienceMaster's degree in computer engineeringPh.D. in ML-related domain