FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Infrastructure Operations Engineer
The Health Management AcademyAI Infrastructure Operations Engineer ensuring operational reliability of healthcare AI platform Companion. Collaborating closely with technology leadership for maintaining Azure infrastructure and operational security.
Posted 6/6/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $120,000 - $140,000 per yearWebsite
Tech Stack
Tools & technologiesAzureCloudKubernetes
About the role
Key responsibilities & impact- Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
- Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
- Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
- Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
- Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services.
- Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling.
- Support release operations and help ensure deployments remain stable, observable, and recoverable.
Requirements
What you’ll need- Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.
- Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling.
- Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms.
- SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis.
- Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas.
- Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.
Benefits
Comp & perks- health/dental/vision benefits
- annual cash incentive program
- 401k with match
- flexible PTO
- PHM for PHM — our services for you and your dependents
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Kubernetestroubleshootingmonitoringobservabilityincident responsecloud-native infrastructureAzureAKSstructured loggingmetrics platforms
Soft Skills
communicationcollaborationoperational prioritizationpost-incident analysisproblem-solvingadaptabilitytechnical explanationteamworkresiliencedocumentation