DevOps Engineer

• Design, implement and support operational and reliability aspects of large scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting
• Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement.
• Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
• Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
• Practice sustainable incident response and blameless postmortems
• Be part of an on call rotation to support production systems

Senior Site Reliability Engineer – DGX Cloud

Salary

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills

Soft skills

Certifications

Staff Site Reliability Engineer – Federal

Senior Mobile DevOps Engineer

Vice President, Site Reliability Engineer – Automation, Network Focus

DevOps Engineer, Azure