DevOps Engineer

• Drive reliability and performance by defining SLOs/SLIs, improving observability, and addressing system bottlenecks across cloud environments
• Automate infrastructure and operations using Terraform, Kubernetes, and CI/CD tools to eliminate toil and enable scalable, fault-tolerant deployments
• Collaborate cross-functionally with product, infrastructure, and DevOps teams to reduce incidents and ensure architectural clarity
• Lead incident management by participating in on-call rotations, conducting postmortems, and implementing automated recovery
• Build and maintain monitoring systems using Prometheus, Grafana, AppDynamics, and Splunk for real-time alerting and root cause analysis
• Develop platform tooling and pipelines for container orchestration, third-party integrations, and cloud-native operations
• Maintain and improve live services by measuring and monitoring latency and overall system health
• Leverage and define KPIs to understand service performance and identify corrective actions
• Create, manage, and use dashboards for continuous monitoring and health checks of applications and infrastructure
• Design and implement solutions to customer friction points and improve service lifecycle from inception through sustainment
• Assist in creating and maintaining automation to improve reliability and velocity during maintenance tasks
• Mentor engineers and champion SRE best practices, embedding a reliability-first culture and ensuring technical excellence

Vice President, Site Reliability Engineer

Salary

Job Level

Tech Stack

About the role

Requirements

Senior Software Engineer, Golang

Senior Deployment Engineer

Senior Manager, Platform Engineering

Senior Platform Engineer

Site Reliability Engineer