Staff Software Engineer – AI Systems, Runtimes

Cloudera

Staff Software Engineer leading architecture and delivery of cloud-native AI platform at Cloudera. Focusing on Kubernetes and AI capabilities across hybrid clouds and data centers.

Posted 6/2/2026full-timeSan Jose • California • 🇺🇸 United StatesLeadWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

GoNode.jsPythonRustC++vLLMTritonKubernetesDockerLangChain

Soft Skills

leadershipteam agilitytechnical alignmentcommunication

Tools & Technologies

KServeKubeRayKnativeAI Gatewaysvector databasesFoundation ModelsMIGfractional GPUsONNXTorchServe

Certifications & Qualifications

Bachelor’s degreeMaster’s degreePhD

Industry Keywords

AIMLLLMquantization techniquesRetrieval-Augmented Generationinference serverscontainerized infrastructuremodel optimizationenterprise data sourcesprompt management

Tech Stack

Tools & technologies

DockerGoJavaScriptKubernetesNode.jsPythonRust

About the role

Key responsibilities & impact

Design and implement elegant, scalable application services (Go/Node.js) that wrap AI capabilities for enterprise use.
Lead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless-style scaling for AI workloads.
Build internal tooling, SDKs, and "AI Gateways" that enhance team agility and simplify the integration of Foundation Models (Llama, GPT) into product features.
Architect robust Retrieval-Augmented Generation (RAG) pipelines and prompt management services that integrate seamlessly with vector databases and enterprise data sources.
Partner with UI engineers, UX designers, and Product Management to ensure the AI platform is not just powerful, but highly usable for internal developers.
Ensure AI workloads are secure, multi-tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within Kubernetes.

Requirements

What you’ll need

Bachelor’s degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), with at least 2+ years focused on AI/ML systems.
Expert proficiency in Python (for AI ecosystem) and strong competence in a systems language like Go or Rust/C++ (for high-performance serving layers).
Deep understanding of LLM deployment challenges and runtimes (e.g., vLLM, ONNX, TorchServe, Triton).
Familiarity with quantization techniques (AWQ, GPTQ) to optimize model size/speed.
Experience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on containerized infrastructure (Docker/Kubernetes).
Ability to navigate the rapidly changing AI landscape, filtering hype from practical engineering solutions, and driving technical alignment across teams.

Benefits

Comp & perks

Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Paid Volunteer Time
Employee Resource Groups