Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
The Health Management Academy

AI Infrastructure Operations Engineer

The Health Management Academy

AI Infrastructure Operations Engineer ensuring operational reliability of healthcare AI platform Companion. Collaborating closely with technology leadership for maintaining Azure infrastructure and operational security.

Posted 6/6/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $120,000 - $140,000 per yearWebsite

Tech Stack

Tools & technologies
AzureCloudKubernetes

About the role

Key responsibilities & impact
  • Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
  • Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
  • Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
  • Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
  • Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services.
  • Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling.
  • Support release operations and help ensure deployments remain stable, observable, and recoverable.

Requirements

What you’ll need
  • Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.
  • Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling.
  • Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms.
  • SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis.
  • Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas.
  • Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.

Benefits

Comp & perks
  • health/dental/vision benefits
  • annual cash incentive program
  • 401k with match
  • flexible PTO
  • PHM for PHM — our services for you and your dependents

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Kubernetestroubleshootingmonitoringobservabilityincident responsecloud-native infrastructureAzureAKSstructured loggingmetrics platforms
Soft Skills
communicationcollaborationoperational prioritizationpost-incident analysisproblem-solvingadaptabilitytechnical explanationteamworkresiliencedocumentation