Salary
💰 $141,670 - $185,482 per year
Tech Stack
CloudKubernetesLinuxPythonTCP/IPTerraformUnixVMware
About the role
- Be the first site reliability engineer at IonQ dedicated to the cloud team: create, support, and manage infrastructure, instrumentation, and tooling for product and engineering teams
- Provide reliable services to customers and act as a force multiplier for engineers by eliminating toil and scaling systems sustainably
- Increase performance, decrease latency, and ensure high uptime for IonQ's quantum computing platform
- Maintain monitoring and alerting systems deployed on Kubernetes (self-managed on-prem and in the cloud) and on Linux workstations
- Operate and debug Unix/Linux internals and network issues as needed
- Automate manual processes and implement tooling to improve operational efficiency
- Mentor junior engineers and drive best practices across the organization
- Participate in incident management, resolution, and post-incident analysis
- Work onsite or hybrid from College Park, MD or Bothell, WA, or fully remote within the US; travel limited up to 10%
Requirements
- BS degree in Computer Science, Computer Engineering, or equivalent practical experience
- 8+ years of professional experience or an equivalent combination of education and experience
- 5+ years experience in site reliability engineering
- 3+ years experience with Kubernetes
- Experience with learning from incidents
- Experience with virtualized and containerized environments
- Experience operating and debugging Unix/Linux OS internals (e.g., filesystems, inodes, system calls) and/or networking (e.g., TCP/IP, routing, network topologies and hardware, SDN)
- Strongly capable in a scripting language of your choice (Shell, Python, etc.)
- Able to identify processes in need of automation quickly, and automate them
- Able (and excited) to mentor junior engineers
- Excellent writer, capable of driving best practices throughout the org
- 10+ years of experience in software development (preferred)
- 5+ years of experience with VMware and Terraform (preferred)
- Comfort with Google Cloud (preferred)
- Experience with scaling databases and applications (preferred)
- Experience with deploying bare-metal Kubernetes (preferred)
- Experience with incident management and leading incident resolution (preferred)
- Experience with incident research and analysis of contributing factors (preferred)
- Employment contingent on verifying U.S. Person status for export control and government contracts, obtaining any necessary license, and/or confirming availability of a license exception