Toast

Senior Site Reliability Engineer – Process Automation

Toast

full-time

Posted on:

Origin:  • 🇺🇸 United States • Massachusetts

Visit company website
AI Apply
Manual Apply

Salary

💰 $134,000 - $214,000 per year

Job Level

Senior

Tech Stack

AWSAzureCloudGoGoogle Cloud PlatformITSMPythonTerraform

About the role

  • Provide automation for incident and change management processes to improve release consistency and enable faster incident response
  • Maintain and improve key organizational processes for Incident and Change management, including change control, rapid detection, response, root cause analysis, and continuous learning from issues
  • Drive and lead optimizations to existing processes, identify areas for improvement, and implement automated solutions to enhance efficiency and reliability of Toast systems
  • Utilize, configure, and support tools such as JIRA, FireHydrant, and Backstage for tracking events, incidents, and changes, and maintain the Service Catalog
  • Enable low-risk, compliant releases with rapid rollback capability to maintain platform reliability
  • Implement automation for risk mitigation strategies to minimize the impact of changes and releases on Toast customers
  • Collaborate closely with leadership, 3rd party vendors, and relevant stakeholders to drive work to completion

Requirements

  • Industry experience with 3-7 years engineering experience with a focus on SRE
  • Bachelor’s Degree in Computer Science, engineering, or related field
  • Working knowledge of complex cloud environments (AWS, GCP, Azure, etc.)
  • Experience scripting automation (Python, Go, etc)
  • Experience with Infrastructure as code (Terraform, etc)
  • Experience driving and leading projects
  • Experience participating in and leading Incident Response and Blameless Retrospectives/post-mortems
  • Strong written and verbal communication skills
  • Strong problem-solving skills and the ability to think strategically and analytically
  • Experience working with a diverse global team across multiple regions and time zones
  • Working knowledge of various best practice frameworks, including ITIL, ITSM, Agile/scrum, change management, etc a plus
  • Experience with Incident and Change processes and tools (JIRA, OpsGenie, FireHydrant, DX, etc) a plus