Ensure the high reliability and scalability of multi-regional infrastructure and shared platforms.
Promote a DevOps Enablement culture by supporting teams in interacting with CI/CD pipelines, observability systems, and secret management tools.
Design, maintain, and evolve automation for deployment, monitoring, and incident response.
Advance the technology stack through automation and innovation, using data-driven insights to improve performance, security, cost-efficiency, and eliminate repetitive manual tasks.
Proactively identify and mitigate system anomalies before they impact users or SLAs.
Maintain clear documentation and create tooling to improve reliability and operational transparency.
Collaborate cross-functionally with developers, data engineers, and platform teams to ensure smooth operations and fast incident recovery.
Requirements
At least 4 years of experience as an SRE Engineer.
Advanced Kubernetes skills: Deep hands-on expertise managing production clusters for 2+ years.
A highly proactive, collaborative mindset and eagerness to help others succeed.
Proficiency with ArgoCD and GitHub Actions, and a strong understanding of automated CI/CD delivery pipelines.
Proven experience operating and troubleshooting the VictoriaMetrics (Prometheus), Loki, and OpenTelemetry stack.
Strong security mindset: Expertise in security hardening and least-privilege principles; hands-on experience with HashiCorp Vault (Cluster management, Secrets Operator, Vault Injector).
Skilled in Shell scripting, and proficiency in at least one of Python or Golang.
English proficiency at B2 level or higher.
Benefits
Well-being program
Mental Health care program
Compensation for education, including Foreign Language & professional growth courses
Equipment & co-working reimbursement program
Overseas conferences, community immersion
Positive and friendly communication culture
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.