MRSOOL | مرسول

Senior Site Reliability Engineer

MRSOOL | مرسول

full-time

Posted on:

Origin:  • 🇪🇬 Egypt

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AnsibleAWSAzureChefCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesPrometheusPuppetPythonRubyTerraform

About the role

  • Develop and maintain monitoring and alerting systems to proactively identify and address issues.
  • Troubleshoot and escalate production incidents to minimize downtime and improve system reliability.
  • Continuously improve our infrastructure and processes to optimize scalability and efficiency.
  • Participate and take ownership for on-call rotations as needed to ensure 24/7 support for our application.
  • Perform routine maintenance and upgrades as needed to keep our systems up to date.
  • Contribute to ongoing efforts to improve our security posture and compliance with industry standards.
  • Communicate complex technical concepts clearly and concisely to both technical and non-technical stakeholders in order to make the right decision.
  • Mentor and coach junior engineers, fostering their professional growth and enabling them to deliver high-quality work.
  • Stay up-to-date with the latest advancements and trends in site reliability engineering and share knowledge and insights with the team.
  • Identify opportunities for organizational enhancements and propose alternatives to optimize team structures and execution.
  • Collaborate with development teams to design and implement automated deployment and testing pipelines.
  • Collaborate with development teams to design and implement scalable Infrastructure.

Requirements

  • Bachelor’s degree in Computer Engineering, Computer Science, or related field.
  • 5+ years of experience in a similar role, preferably with experience in a high-traffic, high-availability environment.
  • Proficiency in at least one programming language (Python, Ruby, Java, Go, etc.).
  • Strong understanding of cloud infrastructure and related technologies (AWS, GCP, Azure, Kubernetes, Docker, etc.)
  • Excellent troubleshooting and problem-solving skills.
  • Experience with one or more automation and configuration management tools (Chef, Ansible, Puppet, Terraform, etc.).
  • Familiarity with monitoring and alerting tools (Prometheus, Grafana, Nagios, etc.)
  • Strong communication and interpersonal skills, enabling effective collaboration with cross-functional teams.
  • Ability to navigate ambiguity, set clear expectations, and thrive in a fast-paced, dynamic environment.
  • A strong grasp of computer science fundamentals when it comes to dealing with distributed systems and networks.
Cprime, Inc

Automation Engineer

Cprime, Inc
Mid · Seniorfull-time🇮🇳 India
Posted: 19 days agoSource: jobs.lever.co
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaGraphQLITSMJenkinsKubernetesPrometheus+5 more
Cummins Inc.

Senior Platform Engineer

Cummins Inc.
Seniorfull-time🇮🇳 India
Posted: 18 days agoSource: fa-espx-saasfaprod1.fa.ocs.oraclecloud.com
AnsibleAWSAzureChefCloudDockerGoKubernetesLinuxOraclePrometheusPuppet+4 more
Istari

Senior Solutions Infrastructure Engineer

Istari
Seniorfull-time$135k–$220k / year🇺🇸 United States
Posted: 12 hours agoSource: jobs.lever.co
AnsibleAWSAzureCloudGoogle Cloud PlatformKubernetesPostgresTerraform
RELX

Systems Engineering Lead (Lead Ops Engineer)

RELX
Seniorfull-timeKentucky · 🇺🇸 United States
Posted: 29 days agoSource: relx.wd3.myworkdayjobs.com
AnsibleAWSAzureChefCloudDockerGoGoogle Cloud PlatformJenkinsKubernetesLinuxOpenShift+5 more
NVIDIA

Senior Network Automation Architect

NVIDIA
Seniorfull-time$168k–$322k / yearCalifornia, Washington · 🇺🇸 United States
Posted: 28 days agoSource: nvidia.wd5.myworkdayjobs.com
AnsibleAWSAzureCloudConsulFluxGoGoogle Cloud PlatformGrafanaGRPCKubernetesOpenShift+3 more