Salary
💰 $161,000 - $180,000 per year
Tech Stack
AnsibleCloudDjangoDockerFlaskGoJavaKubernetesLaravelLinuxPythonRustSwitchingTerraform
About the role
- Analyze system performance using APM and distributed telemetry data to identify sources of instability
- Improve scalability, reliability, and performance through software enhancements and patching
- Develop tools and automation to streamline the DevOps pipeline
- Design and manage infrastructure in both data center metal environments and in the public cloud
- Conduct predictive failure analysis and disaster planning
- Administer and configure databases and key-value stores with a focus on uptime and performance
- Analyze complex systems to identify operational surprises and minimize downtime
- Participate in incident response and produce postmortem reports
- Collaborate with other engineering teams
Requirements
- STEM degree and/or relevant experience as a Site Reliability Engineer, Devops Engineer, or SWE
- Proficiency in Python or Golang
- Experience with other compiled or high level languages: C, C#, C++, Java, Rust, etc
- Experience running Web applications at scale
- Experience with Web application concepts and frameworks: ORM, MVC architecture, Django, Flask, Laravel, etc
- Proficiency with Linux administration, Bash shell, and strong knowledge of Linux internals (e.g., filesystems, system calls)
- Strong networking knowledge (e.g., routing, switching, TCP stack) for both metal and cloud (VPC, Security Groups) environments
- Experience in database administration and configuration
- Experience with DevOps tools such as Terraform, Ansible, Docker, Kubernetes, ArgoCD, or Helm
- Willingness to participate in on-call rotation and respond to monitoring and alerting of core website functions as needed