Tech Stack
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusTerraform
About the role
- Architect, build, and maintain critical infrastructure that powers Saviynt products.
- Lead and scale Site Reliability Engineering deployment practices across enterprise infrastructure.
- Drive reliability, automation, and operational excellence while managing high-performing DevOps/SRE teams.
- Architect and implement deployment pipelines for multi-cloud environments (AWS, Azure, GCP).
- Establish SLIs, SLOs, and error budgets for deployment systems; lead incident response and post-mortem processes.
- Implement Infrastructure as Code using Terraform, CloudFormation, and Pulumi.
- Design CI/CD pipelines with GitLab, Jenkins, and GitHub Actions; ensure high deployment success rates and rapid rollback capabilities.
- Establish monitoring and observability with Datadog, Prometheus, and Grafana.
- Partner with Engineering, DevOps, and Product teams to define deployment standards and security/compliance frameworks.
- Drive adoption of containerization (Kubernetes, Docker) and serverless technologies; implement chaos engineering and disaster recovery practices.
Requirements
- Proven leadership experience in SRE, DevOps, or Production Engineering roles, with 3+ years managing operations teams in fast-paced, high-growth environments.
- Demonstrated track record of scaling deployment systems for high-traffic applications.
- Hands-on experience with enterprise-grade incident management and on-call rotations.
- Strong communication and collaboration skills with the ability to influence both technical and non-technical stakeholders.
- Cloud Platforms: AWS (required), Azure/GCP (preferred).
- Infrastructure as Code: Terraform, CloudFormation, Ansible, Pulumi (operational experience referenced).
- CI/CD: GitLab CI, Jenkins, GitHub Actions, ArgoCD.
- Containerization: Kubernetes, Docker, Helm.