DevOps Engineer

• Build, lead, and mentor a team of SREs across multiple regions and time zones.
• Define the long-term vision and roadmap for SRE, aligning with organizational objectives.
• Partner with product and engineering to ensure reliability is embedded in design, development, and operations.
• Own the end-to-end reliability of critical customer-facing services.
• Establish and maintain SLOs, SLIs, and error budgets to measure and enforce service quality.
• Drive root cause analysis and problem management for major incidents, ensuring long-term fixes are prioritized.
• Champion adoption of ITIL/OSS processes (incident, change, problem, and capacity management).
• Expand automation in deployment, monitoring, testing, and incident response to reduce toil.
• Oversee observability platforms (e.g., Catchpoint, Grafana, Moogsoft/BigPanda, Prometheus, Datadog).
• Ensure robust configuration, capacity, and change management practices.
• Partner with Network Engineering, DevOps, NOC, and Product Engineering on scalable, resilient architecture.
• Support business continuity, disaster recovery, and compliance requirements.
• Engage with vendors and service providers to manage SLAs and performance outcomes.
• Hire, coach, and develop engineers and managers, creating strong career paths within SRE.
• Foster a culture of reliability, accountability, and continuous improvement.
• Lead succession planning and leadership pipeline development.

Senior Manager, SRE

Principal DevOps Engineer, Kubernetes

Manager, Site Reliability Engineering

AWS DevOps Engineer

Associate Site Reliability Engineer

Senior Site Reliability Engineer