
Senior Software Engineer, Site Reliability
Benchmark
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
About the role
- Contribute to the design, development, and delivery of features that enhance system reliability and scalability.
- Define, measure, and improve SLIs, SLOs, and error budgets in collaboration with engineering teams.
- Participate in building a culture of reliability through knowledge sharing, documentation, and process improvements.
- Implement and improve observability tooling and practices to monitor the health and performance of production systems.
- Participate in incident management, including on-call rotations, root cause analysis, and postmortem reviews.
- Lead smaller initiatives or components of larger projects, ensuring technical quality and operational readiness.
- Collaborate with software engineering, security, and product teams to ensure resilient and secure system design.
- Mentor junior engineers, sharing expertise in SRE principles and AWS best practices.
- Contribute to automation efforts to reduce toil and improve efficiency of operational processes.
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering with a focus on production operations.
- Strong knowledge of AWS cloud services and cloud-native architectures.
- Proficiency in scripting or programming languages (e.g., Python, Bash).
- Experience with observability tools (e.g., CloudWatch, Datadog, Prometheus, Grafana).
- Familiarity with infrastructure-as-code tools (e.g., Terraform, CloudFormation) and CI/CD pipelines.
- Strong problem-solving skills and ability to work cross-functionally.
- Some experience mentoring or coaching junior engineers.
Benefits
- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsSoftware EngineeringAWSPythonBashobservability toolsinfrastructure-as-codeTerraformCloudFormation
Soft Skills
problem-solvingmentoringcollaborationknowledge sharingprocess improvement