
Platform SRE, Reliability Platform Engineer
Ashby Electrical Limited
full-time
Posted on:
Location Type: Hybrid
Location: Denver • Colorado • United States
Visit company websiteExplore more
Salary
💰 $130,000 - $170,000 per year
Tech Stack
About the role
- As a Platform SRE, Reliability Platform Engineer, you will be responsible for developing tools and services that support Todyl’s Application hosting infrastructure, including but not limited to K8s and baremetal.
- Implement and enforce security policies, access control and system patching.
- Build automation to improve the reliability and reduce human interaction for Day 2 Operations.
- Collaborate with product and engineering and deliver solutions that meet the needs of stakeholders and the business.
- Improve Application monitoring and alerting to minimize time to detect and time to restore.
- Participate in a weekly on-call rotation with the team and be available during off-hours for emergency pages.
Requirements
- Experience managing production Linux systems at scale
- MUST HAVE: Experience managing k8s and applications running on k8s.
- MUST HAVE: General competency in one or more scripting languages including Python, Pearl, or Bash.
- Working knowledge of REST APIs.
- Familiarity with building custom Linux ISOs and AMIs.
- Familiarity with networking fundamentals.
- Ability to quickly learn new concepts, frameworks, and technologies.
- Comfortable building and maintaining production services.
- Production experience using CI/CD for code deployment.
- Experience with on-call rotations and incident response processes.
Benefits
- Offers Equity 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesLinuxPythonPerlBashREST APIsCI/CDLinux ISOsAMIsnetworking fundamentals
Soft Skills
collaborationproblem-solvingadaptabilitycommunicationreliabilityincident response