Salary
💰 $180,000 - $210,000 per year
Tech Stack
AirflowApacheAWSAzureCloudGoGoogle Cloud PlatformKubernetesPrometheusPythonSplunk
About the role
- Own and build how we test, build and deploy code in a high-scale PaaS environment
- Collaborate across the company on production system design, standards, and technology choices
- Deliver results while safely evolving production systems ("change the wheels on the bus while it’s moving")
- Participate in incident management and determine sensible operational practices
- Create and maintain comprehensive internal documentation for systems and processes
- Help build out the Platform/Reliability practice and report to the VP of Reliability
- Be directly involved in decision-making, estimate work, and keep commitments
Requirements
- Strong experience in Non-Abstract Systems design and implementation
- Strong proficiency in Python and Golang
- In-depth experience with Kubernetes (CKA or equivalent)
- Experience with observability principles and technologies, including SLI/SLO definition and tracking
- Strong communication skills, both written and verbal, with experience working with a globally distributed team
- Passion for reliability and operational excellence; low tolerance for toil
- Ability to estimate scope of work accurately and coordinate with stakeholders
- Experience with software development best practices: code review, testing, CI/CD, version control, automation, debugging
- Proactive ownership and accountability
- Periodic on-call participation (role involves periodic on-call for services owned)
- Experience working on SaaS/PaaS products across multiple cloud providers (bonus)
- Experience with CircleCI, Chronosphere (Prometheus), Splunk, Bazel, Istio, Playwright, Karpenter, GitHub Actions (bonus)
- Experience with AWS, GCP and Azure (bonus)
- Experience with Apache Airflow (bonus)
- Authorized to work in the United States (application asks this as a required field)