Define and drive the strategic direction for SRE practices and reliability engineering within the organization.
Architect and implement complex systems and solutions, addressing high-impact and cross-team challenges.
Lead major incident response efforts and postmortem analyses.
Partner with engineering, operations, and product teams to embed reliability and performance best practices.
Drive innovation in reliability engineering practices, introducing new tools, technologies, and methodologies.
Oversee long-term capacity planning and forecasting.
Provide guidance and mentorship to senior and junior SREs.
Contribute to and influence organizational policies and best practices related to system reliability.
Requirements
8+ years of experience as an SRE in AWS environments within medium to large-scale organizations.
8+ years of hands-on experience with observability tools, including Prometheus, New Relic, Grafana, or similar.
Exceptional proficiency in programming, with expertise in Python, Go, PowerShell, YAML, Node.js and Bash.
Extensive experience managing database technologies, both SQL and NoSQL.
5+ years of experience in designing and building infrastructure deployment pipelines using Git, GHA, Terraform, Helm, or similar tools.
Advanced expertise in designing and managing production environments in AWS, including services such as VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail, Control Tower, GuardDuty, MSK, S3, Glacier, Gateways, Direct Connect, Route 53, RDS, ALBs, Autoscaling, and more.
Deep knowledge of Linux systems and a range of protocols and technologies, including HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault, ELK, and others.
Expert level experience with Kubernetes and a variety of container and cloud-native technologies.
Proven ability to manage 24/7 on-call rotations, develop runbooks, establish support procedures, and proactively monitor systems across multiple geographic locations.
Ability to excel under pressure in complex, high-stakes environments.
Benefits
Medical, dental and vision insurance
Health Savings Account
Flexible Spending Accounts
Telehealth
401(k) and 401(k) match
Life and AD&D insurance
Short-Term and Long-Term Disability
FTO or PTO
Employee Well-Being program
11 paid holidays plus 1 inclusive holiday per year
Volunteer Time Off
Employee Referral program
Education Reimbursement Program
Employee Recognition and Appreciation program
Additional perk and voluntary benefit programs
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.