Salary
💰 $155,000 - $165,000 per year
Tech Stack
AnsibleCloudGoGrafanaKubernetesLinuxMySQLPrometheusPuppetPythonVMware
About the role
- Lead maintenance and operations for production and development environments, including patching, deployments, and server management
- Ensure high reliability and performance of services, proactively resolving incidents before customer impact
- Participate in 24x7 on-call rotations and drive post-incident reviews and blameless post-mortems
- Architect and implement complex solutions spanning OS, virtualization, network, storage, and cloud layers
- Coordinate on-site deployments in colocation facilities (server/storage installation, decommissioning, and troubleshooting)
- Lead automation initiatives for infrastructure provisioning and operational tasks
- Design and maintain tooling for observability using OSS and commercial platforms (Grafana, Prometheus, ELK)
- Partner with product, security, and engineering teams to deliver infrastructure that meets compliance, performance, and scale requirements
- Continuously improve SRE/DevOps practices, driving documentation quality, operational maturity, and agility
Requirements
- 8+ years in DevOps, SRE, or infrastructure engineering roles
- Proven experience in hybrid infrastructure with strong colocation and on-prem expertise (not exclusively public cloud)
- Proficiency in configuration-as-code tooling (Ansible, Puppet) and scripting (Python, Bash, Go)
- Deeply comfortable in shell environments (Bash, ZSH)
- Expert-level Linux systems knowledge (RHEL-based distributions preferred)
- Experience with Proxmox, KVM, or VMWare in high-availability environments
- Advanced troubleshooting of SANs, load balancers, and virtualization platforms
- Proactive infrastructure monitoring using commercial or OSS alerting systems
- Experience driving observability-first infrastructure and designing metrics and alerting
- Experience participating in 24x7 on-call rotations and post-incident reviews
- Ability to lead technical strategy while maintaining hands-on contributions
- Preferred: Experience with F5 BigIP LTMs, NetApp SANs, Grafana, Prometheus, ELK
- Preferred: Working knowledge of MySQL and Kubernetes (or motivation to learn it)
- Preferred: Prior exposure to SaaS-based WAF/DDoS platforms (CloudFlare, Akamai, Silverline)
- Preferred: Experience in agile teams (Scrum, Kanban) and hands-on GitLab experience is a plus
- Must have legal right to be employed in the United States (per Equal Opportunity statement)