Red Hat

Principal Software Engineer, AI Model Serving

Red Hat

full-time

Posted on:

Origin:  • 🇺🇸 United States • North Carolina

Visit company website
AI Apply
Manual Apply

Salary

💰 $148,540 - $245,050 per year

Job Level

Lead

Tech Stack

AWSAzureCloudGoKubernetesLinuxOpenShiftOpen SourcePythonPyTorchTensorflow

About the role

  • Lead the team strategy and implementation for Kubernetes-native components in Model Serving, including Custom Resources, Controllers, and Operators.
  • Be an influencer and leader in MLOps-related open source communities to help build an active MLOps open source ecosystem for Open Data Hub and OpenShift AI
  • Act as an MLOps SME within Red Hat by supporting customer-facing discussions, presenting at technical conferences, and evangelizing OpenShift AI within the internal community of practices
  • Architect and design new features for open-source MLOps communities such as KubeFlow and KServe
  • Provide technical vision and leadership on critical and high-impact projects
  • Mentor, influence, and coach a team of distributed engineers
  • Ensure non-functional requirements including security, resiliency, and maintainability are met
  • Write unit and integration tests and work with quality engineers to ensure product quality
  • Use CI/CD best practices to deliver solutions as productization efforts into RHOAI
  • Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge with team members
  • Collaborate with product management, other engineering, and cross-functional teams to analyze and clarify business requirements
  • Communicate effectively to stakeholders and team members to ensure proper visibility of development efforts
  • Give thoughtful and prompt code reviews
  • Represent RHOAI in external engagements including industry events, customer meetings, and open-source communities
  • Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
  • Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

Requirements

  • Proven expertise with Kubernetes API development and testing (CRs, Operators, Controllers), including reconciliation logic.
  • Strong background with model serving (like KServe, vLLM) and distributed inference strategies for LLMs (tensor, pipeline, data parallelism).
  • Deep understanding of GPU optimization, autoscaling (KEDA/Knative), and low-latency networking (e.g., NVLink, P2P GPU).
  • Experience architecting resilient, secure, and observable systems for model serving, including metrics and tracing.
  • Advanced skills in Go and Python; ability to design APIs for high-performance inference and streaming.
  • Excellent system troubleshooting skills in cloud environments and the ability to innovate in fast-paced environments.
  • Strong communication and leadership skills to mentor teams and represent projects in open-source communities.
  • Autonomous work ethic and passion for staying at the forefront of AI and open source.
  • The following will be considered a plus: An existing contributor in one or more MLOps open source projects such as KubeFlow, KServe, RayServe, and vLLM is a huge plus
  • Familiarity with optimization techniques for LLMs (quantization, TensorRT, Hugging Face Accelerate).
  • Knowledge of end-to-end MLOps workflows, including model registry, explainability, and drift detection.
  • Bachelor's degree in statistics, mathematics, computer science, operations research, or a related quantitative field, or equivalent expertise; Master’s or PhD is a big plus
  • Understanding of how Open Source and Free Software communities work
  • Experience with development for public cloud services (AWS, GCE, Azure)
  • Experience in engineering, consulting or another field related to model serving and monitoring, model registry, explainable AI, deep neural networks, in a customer environment or supporting a data science team
  • Highly experienced in OpenShift
  • Familiarity with popular Python machine learning libraries such as PyTorch, Tensorflow, and Hugging Face