
Site Reliability Engineer
CloudFactory
full-time
Posted on:
Location Type: Hybrid
Location: Kathmandu • 🇳🇵 Nepal
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSCloudMicroservices
About the role
- Design, build, and maintain scalable, resilient infrastructure that enables developer productivity and platform reliability.
- Establish and maintain Infrastructure as Code (IaC) standards, best practices, and reusable templates.
- Deploy, support, monitor, and maintain new and existing services, platforms, and application stacks.
- Troubleshoot production issues, perform rollbacks and service restorations, and create dashboards to ensure high availability.
- Create, maintain, and enhance runbooks for on-call and incident resolution.
- Define and manage availability targets and SLAs for platform products.
- Ensure production readiness across performance, availability, security, and compliance before go-live.
- Build and improve monitoring, alerting, logging, and debugging tools.
- Manage environment capacity planning and performance optimization.
- Partner with engineering teams to drive performance improvements using metrics (latency, CPU, etc.).
Requirements
- Cloud Architecture: Strong expertise in AWS-based cloud infrastructure and microservices (serverless and containerized).
- Infrastructure as Code (IaC): Proven experience provisioning and managing infrastructure via code.
- CI/CD & DevSecOps: Solid understanding of CI/CD pipelines, web security, and DevSecOps practices.
- Operational Excellence: Experience with monitoring, alerting, incident management, and 24x7 operational support.
- Broader web security principles beyond standard DevSecOps practices.
- Ability to collaborate effectively across global teams and time zones.
- Strong problem-solving skills with the ability to simplify complex issues into actionable solutions.
- High ownership mindset with the drive to meet deadlines and support team success.
- Willingness to participate in 24/7 operational support processes.
Benefits
- Great Mission and Culture
- Meaningful Work
- Market competitive salary
- Quarterly variable compensation
- Remote and Home working
- Comprehensive medical cover
- Group life insurance
- Personal development and growth opportunities
- Office snacks and lunch
- Periodic team building and social events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSInfrastructure as CodeCI/CDDevSecOpsmonitoringalertingincident managementperformance optimizationtroubleshootingcapacity planning
Soft skills
problem-solvingcollaborationownership mindsetcommunicationtime managementteam supportadaptabilitycritical thinkingsimplification of complex issuesglobal teamwork