Salary
💰 $61,310 - $104,390 per year
About the role
- Own the reliability, performance, capacity, and lifecycle of VMware vSphere (vCenter/ESXi) and Microsoft Hyper‑V/Failover Clustering platforms.
- Design and implement automation‑first workflows for provisioning, configuration, patching, upgrades, and compliance.
- Build and evolve observability (metrics, logs, traces, SLOs/error budgets) to drive data‑informed operations.
- Improve resilience and security (HA/DR patterns, backup/restore validation, hardening, least privilege, segmentation).
- Partner with networking, storage, and security teams to deliver performance and availability at scale across datacenters and edge sites.
- Lead incident response and post‑incident learning; reduce toil through runbooks and self‑healing.
- Evaluate and adopt AI‑assisted operations and emerging platform capabilities.
- Document architecture, standards, and runbooks; mentor engineers and champion best practices.
Requirements
- Extensive experience operating vSphere (vCenter/ESXi) and/or Hyper V/Failover Clustering in production at scale
- Proficiency with automation/tooling (e.g., PowerShell/PowerCLI, Python, desired-state or IaC patterns)
- Hands on with observability platforms and SRE concepts (SLOs, error budgets, capacity modeling)
- Strong understanding of networking, storage, security fundamentals in virtualized environments
- Experience with resilience engineering (HA, DR, failover testing) and lifecycle management (patching/upgrades)
- Bias for collaboration, ownership, and continuous improvement
- Healthcare/regulated environment experience is a plus; a passion for caregiver and patient impact is essential
- Health care benefits (medical, dental, vision)
- Retirement 401(k) Savings Plan with employer matching
- Life insurance
- Disability insurance
- Paid parental leave
- Vacations
- Holidays
- Time off benefits for health issues
- Voluntary benefits
- Well-being resources
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
VMware vSpherevCenterESXiMicrosoft Hyper-VFailover ClusteringautomationPowerShellPowerCLIPythonresilience engineering
Soft skills
collaborationownershipcontinuous improvementmentoringincident responsepost-incident learning