DevOps Engineer

• Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
• Innovate relentlessly: Identify pain points, propose creative solutions, and drive initiatives that simplify, scale, and strengthen the platform.
• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
• Own observability: Enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that drive reliability outcomes.
• Drive automation: Develop and improve internal tooling, IaC frameworks, and pipelines (Terraform, GitLab CI/CD) to reduce manual intervention and enable self-healing systems.
• Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
• Be on-call.
• Practice sustainable incident response and blameless postmortems. Lead post-incident reviews (RCAs) and identify long-term fixes that improve stability, reliability, and developer experience.
• Implement monitoring, Logging, alerting, and SLA Reporting.
• Create and maintain technical documentation.
• Implement, maintain and mature SRE best practices.
• Lead incidents: Act as Incident Commander for Incidents; coordinate cross-team response, manage communications, and ensure rapid service restoration.
• Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth.
• Collaborate with members of the Platform Engineering team to implement and support far-reaching strategic efforts, provide constructive feedback, and foster a collaborative environment.
• Work cross-functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users.

Site Reliability Engineer

Senior Site Reliability Engineer

Senior DevOps Engineer

Junior DevOps Engineer

AI/ML DevOps Intern

Team Lead – DevOps