Tech Stack
AzureCloudDockerKubernetes.NETPythonRedisSQLTerraformVault
About the role
- Mesh is a global payment network for crypto connecting exchanges, wallets, and financial platforms
- Ensure reliability, performance, and availability of Azure infrastructure and implement SRE best practices
- Design and manage scalable, highly available Azure infrastructure; define and track SLIs, SLOs, and error budgets
- Build self-healing systems and implement chaos engineering practices
- Design observability solutions (logging, tracing, metrics, alerting) and participate in on-call rotations
- Own and implement IaC practices using Terraform and Azure DevOps and develop automation to eliminate toil
- Conduct performance analysis and capacity planning; implement auto-scaling and resource optimization strategies
- Implement security measures for Azure resources, integrate security into CI/CD, and maintain cloud security posture
- Lead incident response, conduct blameless post-mortems, and develop runbooks and automated remediation
- Collaborate cross-functionally across time zones and mentor team members on SRE and infrastructure management
Requirements
- 8+ years in SRE, DevOps, or cloud infrastructure roles, with significant Azure experience
- Knowledge in .NET framework and .NET Core for application troubleshooting and performance optimization
- Strong automation skills with Terraform and Bash scripting; Python and Terragrunt are a plus
- Deep understanding of SRE principles: SLIs/SLOs, error budgets, elimination of toil, and blameless post-mortems
- Expertise in observability tools and practices: Azure Monitor, Application Insights, Datadog, distributed tracing, and structured logging
- Knowledge of containerization technologies: Docker, Kubernetes, and Azure Kubernetes Service (AKS)
- Proficiency in Azure services: App Service, Functions, Storage, API Management, Key Vault, Logic Apps
- Database experience with Azure SQL, Azure Cache for Redis, and Azure Cosmos DB
- Azure Governance expertise: Blueprints, policies, tagging, cost management, and savings plans
- CI/CD proficiency with GitHub Actions and GitOps practices
- Networking skills: Azure Front Door, WAF, Cloudflare, Azure Firewall, and VNet
- Experience with multi-region architectures and disaster recovery planning
- Strong communication skills for cross-timezone collaboration and incident communication
- Knowledge of Istio, Helm, KEDA, Dapr, and ArgoCD is a plus
- Preferred certifications: Microsoft Certified: Azure Solutions Architect Expert, CKA: Certified Kubernetes Administrator, or similar