
Senior Site Reliability Engineer, SRE
Talent 360 ME
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇸🇦 Saudi Arabia
Visit company websiteJob Level
Senior
Tech Stack
AWSCloudFirewallsGoogle Cloud PlatformKubernetesTerraform
About the role
- Provide scalable, reliable, durable, and secure global database services for our clients’ cloud infrastructure hosted on AWS or GCP
- Identify significant projects that improve reliability, cost savings, and/or revenue
- Identify changes in product architecture with a data-driven approach
- Influence the product roadmap for improved resiliency and reliability
- Proactively work on efficiency and capacity planning
- Identify parts of the system that do not scale and drive long-term resolution
- Identify Service Level Indicators (SLIs)
- Lead initiatives and problem definition, design, and planning
- Perform and run blameless RCAs on incidents and outages
- Maintain awareness and actively influence stage group plans
Requirements
- 5+ years of related experience
- Performs application-specific production support, incident management, problem management, RCAs, and service restoration as needed
- Collaborating with engineering and development teams to evaluate and identify optimal cloud solutions
- Plan and achieve high availability, performance, and availability of the product service
- Development/coding experience and skills for writing custom automation solutions
- Strong understanding of web hosting infrastructure and high availability architecture
- Demonstrated knowledge of fundamental cloud security (e.g., Identity and Access Management, ACL, firewalls)
- Deep understanding of AWS cloud services and how to leverage them
- Strong Experience in Infrastructure as Code (IaC) technologies like Terraform
- Familiarity with Kubernetes-specific platform components
Benefits
- Professional development
- Flexible work arrangements
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
cloud infrastructureAWSGCPInfrastructure as CodeTerraformKubernetesapplication-specific production supportincident managementproblem managementcustom automation solutions
Soft skills
collaborationinfluenceproblem definitiondesignplanningefficiencycapacity planningdata-driven approachleadershipawareness