Expertise with Linux engineering and administration for thousands of bare metal servers and virtual machines
Responsible for troubleshooting server hardware issues
Responsible for all operational aspects of our platform - Availability, Latency, Throughput, Monitoring, Issue Response (analysis, remediation, deployment) and Capacity Planning with respect to Latency and Throughput
Work in a team of highly motivated engineers distributed across the globe
Use your passion for technology, automation, and tooling to ensure our platform operates 24x7
Obsess about learning, and champion the newest technologies & tricks with others, raising the technical IQ of the team.
Have broad exposure to our entire architecture and become one of our experts in our overall process flow
Have an intrinsic drive to make things better
Have experience with modern monitoring and telemetry stacks (ELK, Prometheus, Grafana)
Gather and analyze metrics from both operating systems and applications to assist in performance tuning
Ability to lead incident analysis for incidents, champion incident response practices and assist in correlating incidents to systemic problems, and drive towards resolution.
Requirements
Bachelors degree and/or equivalent experience in Computer Science
A minimum of 7 years of experience in software engineering
A minimum of 7 years of experience in one or more of: C++, Java, Python, Go
Configuration management experience with one or more tools such as Puppet, Chef, Ansible
Solid understanding of application design, including operational trade-offs of various designs
Analytical skills coupled with a strong sense of urgency, ownership, and drive
Ability to work with well in a team-focused environment with other SREs and Engineers
Ability to broadly communicate and present recommended conventions defined by the reliability team broadly
Benefits
Remote-friendly and flexible work culture
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Linux engineeringC++JavaPythonGostorage technologiesInfrastructure technologiesconfiguration managementmonitoring and telemetry stacksperformance tuning
Soft skills
analytical skillssense of urgencyownershipdriveteamworkcommunicationleadershipproblem-solvingpassion for technologycontinuous learning