Diabetes Youth Families

Senior Manager – SRE

Diabetes Youth Families

full-time

Posted on:

Location Type: Hybrid

Location: MassachusettsUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $178,700 - $268,025 per year

Job Level

About the role

  • Lead the execution and continuous improvement of SRE practices across assigned platforms and services, reinforcing a culture of reliability, efficiency, and operational ownership
  • Manage and evolve automation strategies that reduce operational toil, improve system reliability, and increase engineering productivity
  • Design, implement, and operate observability, monitoring, and alerting solutions that provide actionable insight into system health, availability, and performance
  • Own and lead high‑severity incident response for supported services, ensuring effective triage, coordination, root cause analysis, and completion of corrective and preventative actions
  • Analyze reliability, performance, and capacity metrics to identify risks, drive proactive improvements, and support long‑term system resilience
  • Partner with software engineering, product, and infrastructure teams to embed SRE principles throughout the development lifecycle and influence architecture and design decisions
  • Build, coach, and develop SRE managers and engineers, fostering technical excellence, career growth, and strong on‑call and operational practices
  • Support capacity planning, scalability assessments, and demand forecasting for critical systems and services
  • Ensure SRE processes, standards, and best practices are well documented, understood, and consistently applied

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 12+ years of overall engineering experience, including 5+ years in Site Reliability Engineering, DevOps, or a similar role
  • 3+ years of experience leading engineering teams or managing senior technical contributors
  • Strong experience with observability and monitoring platforms such as Datadog, Prometheus, Dynatrace, Grafana, ELK, or similar
  • Proficiency in at least one programming language such as Python, Go, or Java
  • Hands‑on experience with cloud platforms (AWS, Azure, or GCP) and container orchestration technologies (Docker, Kubernetes)
  • Solid working knowledge of AWS services such as VPC, EC2, ELB, ECS, EKS, Lambda, IAM, CloudWatch, S3, SQS, SNS, Route53, and WAF
  • Experience with infrastructure‑as‑code tools such as Terraform, Ansible, or equivalents
  • Strong troubleshooting and problem‑solving skills in distributed systems environments
  • Working knowledge of security best practices and operational risk management
  • Experience with resilience testing, chaos engineering, or failure‑injection techniques
Benefits
  • Medical, dental, and vision insurance
  • 401(k) with company match
  • Paid time off (PTO)
  • And additional employee wellness programs
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsobservabilitymonitoringprogramming languagePythonGoJavainfrastructure-as-coderesilience testing
Soft Skills
leadershipcoachingproblem-solvingcommunicationcollaborationanalytical skillsoperational ownershiptechnical excellencecapacity planningincident response