Tech Stack
AnsibleCloudElasticSearchGrafanaKubernetesOpen SourcePostgresPrometheusPython
About the role
- Architect and manage Kubernetes deployments for Open VSX in production environments
- Oversee PostgreSQL and ElasticSearch clusters, ensuring data integrity, performance, and scalability
- Implement and refine monitoring, alerting, and incident response systems to maintain high service reliability
- Collaborate with development teams to improve CI/CD pipelines and deployment workflows
- Partner with the Security team to implement and uphold organisational policies and secure-by-design practices
- Lead root cause analysis and postmortems for service disruptions, driving continuous improvement
- Provide technical leadership and mentorship to junior operations staff
- Engage with the community and users to resolve support issues and gather feedback
- Maintain documentation and contribute to operational playbooks
- Define and report on service KPIs, SLOs, and operational health indicators
- Provide strategic advice to leadership on platform operations and technology decisions
- Contribute to annual planning cycles by informing resource needs, tooling requirements, and infrastructure budgeting
Requirements
- 5+ years of experience in site reliability engineering, DevOps, or IT operations
- Deep expertise in Kubernetes, Helm, and container orchestration
- Strong experience with PostgreSQL and ElasticSearch in production environments
- Proficiency in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack)
- Solid scripting and automation skills (e.g., Bash, Python, Ansible)
- Familiarity with GitHub Actions or similar CI/CD tools
- Excellent troubleshooting skills and a proactive mindset
- Ability to work independently in a remote, multicultural team
- Bonus: experience supporting open source infrastructure or registries
- Excellent communication skills