FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Kubernetes Platform Engineer – AI/ML Infrastructure
CiscoSenior Kubernetes Platform Engineer designing and operating large-scale Kubernetes infrastructure for AI/ML workloads. Leading technical direction and ensuring performance, reliability, and scalability within complex systems.
Posted 5/15/2026full-timeRTP • North Carolina, Texas • 🇺🇸 United StatesSenior💰 $137,000 - $200,500 per yearWebsite
Tech Stack
Tools & technologiesDistributed SystemsGoKubernetesOpenShift
About the role
Key responsibilities & impact- Architect, build, and operate large-scale on-prem Kubernetes platforms (OpenShift/Anthos), including control plane and etcd lifecycle management
- Define and evolve scalable, multi-tenant platform architecture supporting AI/ML and GPU-based workloads
- Enable and optimize ML workloads including training, inference, and LLM deployment pipelines on Kubernetes
- Build platform extensions using Kubernetes controllers, operators, CRDs, and Golang-based services
- Implement Infrastructure as Code and automation to improve scalability, consistency, and operational efficiency
- Drive AIOps capabilities using telemetry, automation, anomaly detection, and self-healing systems for platform reliability
- Improve observability (metrics, logs, traces) and optimize resource utilization, scheduling, and cluster performance
- Partner with ML engineers and data scientists to operationalize ML workflows and ensure platform readiness for AI workloads
- Participate in on-call rotations, owning incident response, reliability, and continuous operational improvement
- Mentor engineers and contribute to defining platform engineering standards and best practices
Requirements
What you’ll need- 8+ years of software engineering experience
- 4+ years of hands-on Kubernetes production experience with control plane ownership
- Strong experience operating on-prem or self-managed Kubernetes environments
- Deep expertise in etcd management (backup, restore, recovery, upgrades)
- Strong proficiency in Go with experience building Kubernetes controllers, operators, CRDs, and webhooks
- Deep understanding of Kubernetes internals (API server, scheduler, controller loops, reconciliation)
- Experience supporting AI/ML or GPU-based workloads on Kubernetes platforms
- Proven experience operating and debugging large-scale distributed systems
- Experience participating in on-call rotations and production incident management
Benefits
Comp & perks- Medical, dental and vision insurance
- 401(k) plan with Cisco matching contribution
- Paid parental leave
- Short and long-term disability coverage
- Basic life insurance
- 10 paid holidays per full calendar year
- 1 floating holiday for non-exempt employees
- 1 paid day off for employee’s birthday
- Paid year-end holiday shutdown
- 4 paid days off for personal wellness
- 16 days of paid vacation time for non-exempt employees
- Flexible vacation time off program for exempt employees
- 80 hours of sick time off provided on hire date and each January 1st thereafter
- Optional 10 paid days per full calendar year to volunteer
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesOpenShiftAnthosetcd managementGolangInfrastructure as CodeAIOpsML workloadsKubernetes controllersdistributed systems
Soft Skills
mentoringincident responseoperational improvementcollaboration