Tech Stack
AzureCloudCyber SecurityDistributed SystemsElasticSearchFirewallsGoGrafanaKafkaKubernetesMicroservicesPrometheusPythonTerraformVault
About the role
- Lead the design, implementation, and continuous improvement of build and release pipelines
- Provide day-to-day guidance and direct support to a team of three global DevOps engineers
- Conduct 1:1s, deliver performance reviews, and promote professional development across the team
- Architect and execute cloud deployment strategies for optimized performance and cost efficiency
- Review code, infrastructure configurations, and documentation to uphold best practices
- Train and mentor developers and stakeholders in Azure DevOps and Git workflows
- Advocate for DevOps culture and best practices across engineering and IT teams
- Collaborate with Infrastructure engineers to diagnose and resolve deployment challenges
- Document and communicate procedures with clarity and consistency
- Troubleshoot issues during deployment and provide production support in staging and live environments
- Represent the team in cross-functional meetings with insightful recommendations
- Research and evaluate emerging technologies for possible adoption
- Infrastructure & Cloud Operations
- Design and manage Azure infrastructure using IaC principles with Terraform and Terragrunt
- Maintain AKS clusters with Istio for secure, scalable service-to-service communication
- Manage Azure resources such as networking, Key Vault, Blob Storage, and CosmosDB
- Configure OpenSearch clusters for efficient logging and data indexing
- Optimize CosmosDB performance with scaling strategies and cost oversight
- DevOps Practices & Automation
- Lead the development of robust CI/CD pipelines.
- Establish and maintain GitOps workflows via ArgoCD
- Author and maintain automation scripts (primarily Bash) to streamline operations
- Continuously evaluate and enhance deployment and developer processes
- Security & Compliance
- Implement Zero Trust security architecture using Azure PIM and RBAC
- Define and enforce least privilege access models and security best practices
- Configure Web Application Firewalls (WAFs) and manage network security in Azure
- Ensure alignment with compliance frameworks and standards
- Monitoring & Reliability
- Deploy observability tooling using OpenTelemetry, Azure Monitor, and Application Insights
- Participate in on-call rotation and incident response protocols
- Lead post-incident analysis and drive implementation of long-term fixes
- Define and maintain reliability objectives, SLAs, and error budgets
Requirements
- 5+ years of DevOps/SRE experience in SaaS or cloud-native environments
- 2+ years in a formal engineering leadership role, including people management and performance reviews
- Proven expertise with Microsoft Azure infrastructure and services
- Strong hands-on experience with:
- Terraform and Terragrunt
- Kubernetes (AKS) and Helm
- Istio Service Mesh and Istio Ingress Gateways
- ArgoCD and GitOps workflows
- CI/CD pipelines via Azure DevOps
- Zero Trust architecture, including PIM and RBAC
- Observability tools: OpenTelemetry, Prometheus, Grafana, Azure Monitor
- Bash scripting; Python or Go for automation/tooling
- Comfortable supporting production systems in on-call rotation
- Strong communication, leadership, and troubleshooting abilities
- Experience with WAFs and secure network configurations in Azure