Salary
💰 $190,000 - $225,000 per year
Tech Stack
AnsibleAWSAzureCloudGoGoogle Cloud PlatformJavaKubernetesPythonTerraform
About the role
- Design, implement, and manage scalable and resilient infrastructure in Google Cloud using Infrastructure as Code (IaC) principles.
- Ensure compliance with healthcare security standards and best practices; implement security controls, manage identity and access, and conduct regular security audits.
- Streamline CI/CD pipelines, automate deployment processes, and integrate best practices for continuous integration and delivery.
- Develop and maintain monitoring systems to ensure high availability and performance; implement alerting mechanisms to respond to incidents.
- Respond to production incidents, conduct root cause analysis, and implement corrective actions to prevent future occurrences.
- Collaborate closely with cross-functional teams including engineering, product, and operations to integrate SRE practices into development.
- Work across security, DevOps, production monitoring, Cloud FinOps, and infrastructure as code responsibilities.
Requirements
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
- 5+ years of experience in Site Reliability Engineering, DevOps, or a related role.
- Strong knowledge of building and maintaining production software in a cloud environment (e.g. GCP, AWS, Azure).
- Deep understanding of best practices of Site Reliability, including API, Application and infrastructure reliability.
- Proficiency in a modern programming language such as Golang, Python, Java.
- Experience with Kubernetes and container orchestration.
- Deep understanding of infrastructure as code (e.g., Terraform, Ansible).
- Strong knowledge of privacy and security best practices and infosec principles.
- Excellent problem-solving and troubleshooting skills.
- Ability to work independently and take ownership of complex projects.
- Strong communication skills and ability to collaborate with cross-functional teams.