
Staff Cloud Operations Engineer
Extreme Networks
full-time
Posted on:
Location Type: Remote
Location: Ireland
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Architect & Scale Infrastructure: Design and implement multi-cluster, multi-region Kubernetes deployments using EKS, GKE, and AKS. Build infrastructure that scales across regions and cloud providers.
- Own Production Systems: Take end-to-end ownership of production infrastructure. Drive incident response, postmortems, and improvements to prevent recurrence.
- Infrastructure as Code at Scale: Build and maintain Terraform modules for complex infrastructure patterns. Manage thousands of configuration files across clusters, regions, and environments using GitOps principles.
- GitOps & Deployment Excellence: Design and optimize ArgoCD ApplicationSets and Helm chart architectures. Build deployment pipelines that enable safe, automated releases across hundreds of microservices.
- Performance & Reliability Engineering: Analyze system performance, identify bottlenecks, and implement optimizations. Improve SLOs through capacity planning, autoscaling, and architectural improvements.
- Observability & Monitoring: Build and enhance monitoring, alerting, and observability using Prometheus, Grafana, Loki, and custom tooling. Drive visibility into complex distributed systems.
- Security & Compliance: Implement security controls, compliance frameworks, and best practices across cloud infrastructure. Design secure multi-tenant architectures.
- Technical Leadership: Mentor engineers, establish best practices, and drive technical decisions. Collaborate with platform, SRE, and product teams to deliver reliable infrastructure.
Requirements
- 5+ years in cloud infrastructure engineering, with deep expertise in at least one major cloud provider (AWS preferred)
- Strong Kubernetes experience: cluster design, operators, controllers, and multi-cluster management
- Proficiency with Infrastructure as Code: Terraform, CloudFormation, or similar
- GitOps expertise: ArgoCD, Flux, or similar; experience with ApplicationSets and complex deployment patterns
- Deep Linux and networking knowledge
- Experience with distributed systems: Elasticsearch, PostgreSQL, Redis, Kafka, RabbitMQ
- Monitoring and observability: Prometheus, Grafana, ELK stack, or similar
- Strong problem-solving skills and experience debugging complex distributed systems
- Experience with cloud security, compliance (SOC2, ISO27001), and secure-by-design practices
- Excellent communication skills for working across time zones and with distributed teams
- Self-directed with a track record of owning problems end-to-end.
Benefits
- Equal employment opportunities to all employees and applicants.
- Prohibits discrimination and harassment of any type.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesTerraformCloudFormationArgoCDLinuxElasticsearchPostgreSQLRedisKafkaRabbitMQ
Soft Skills
problem-solvingcommunicationself-directedmentoringcollaboration
Certifications
SOC2ISO27001