CloudFactory

Site Reliability Engineer

CloudFactory

full-time

Posted on:

Location Type: Hybrid

Location: Kathmandu • 🇳🇵 Nepal

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudMicroservices

About the role

  • Design, build, and maintain scalable, resilient infrastructure that enables developer productivity and platform reliability.
  • Establish and maintain Infrastructure as Code (IaC) standards, best practices, and reusable templates.
  • Deploy, support, monitor, and maintain new and existing services, platforms, and application stacks.
  • Troubleshoot production issues, perform rollbacks and service restorations, and create dashboards to ensure high availability.
  • Create, maintain, and enhance runbooks for on-call and incident resolution.
  • Define and manage availability targets and SLAs for platform products.
  • Ensure production readiness across performance, availability, security, and compliance before go-live.
  • Build and improve monitoring, alerting, logging, and debugging tools.
  • Manage environment capacity planning and performance optimization.
  • Partner with engineering teams to drive performance improvements using metrics (latency, CPU, etc.).

Requirements

  • Cloud Architecture: Strong expertise in AWS-based cloud infrastructure and microservices (serverless and containerized).
  • Infrastructure as Code (IaC): Proven experience provisioning and managing infrastructure via code.
  • CI/CD & DevSecOps: Solid understanding of CI/CD pipelines, web security, and DevSecOps practices.
  • Operational Excellence: Experience with monitoring, alerting, incident management, and 24x7 operational support.
  • Broader web security principles beyond standard DevSecOps practices.
  • Ability to collaborate effectively across global teams and time zones.
  • Strong problem-solving skills with the ability to simplify complex issues into actionable solutions.
  • High ownership mindset with the drive to meet deadlines and support team success.
  • Willingness to participate in 24/7 operational support processes.
Benefits
  • Great Mission and Culture
  • Meaningful Work
  • Market competitive salary
  • Quarterly variable compensation
  • Remote and Home working
  • Comprehensive medical cover
  • Group life insurance
  • Personal development and growth opportunities
  • Office snacks and lunch
  • Periodic team building and social events

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSInfrastructure as CodeCI/CDDevSecOpsmonitoringalertingincident managementperformance optimizationtroubleshootingcapacity planning
Soft skills
problem-solvingcollaborationownership mindsetcommunicationtime managementteam supportadaptabilitycritical thinkingsimplification of complex issuesglobal teamwork