
Site Reliability Engineer, Staff Engineer
Nagarro
full-time
Posted on:
Location Type: Remote
Location: Philippines
Visit company websiteExplore more
Job Level
About the role
- Provide L2/L3 support for AWS cloud infrastructure and production environments
- Implement and maintain automation for operational tasks, deployments, and monitoring
- Monitor system health, troubleshoot incidents, and ensure high availability of services
- Develop and enhance scripts/tools to reduce manual effort and improve efficiency
- Work closely with DevOps, Development, and Infrastructure teams for issue resolution
- Participate in on-call rotations and incident management during US shift hours
- Maintain and improve monitoring, alerting, and logging systems
- Ensure adherence to SRE best practices for reliability, scalability, and performance
- Document runbooks, SOPs, and knowledge base articles
Requirements
- Strong hands-on experience with AWS services (EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch)
- Experience in automation and scripting using Python, Shell, or PowerShell
- Familiarity with Infrastructure as Code tools (Terraform or CloudFormation)
- Understanding of CI/CD pipelines and DevOps practices
- Experience with monitoring tools like CloudWatch, Grafana, Prometheus, or ELK
- Good understanding of Linux systems and networking concepts
- Exposure to containerization (Docker/Kubernetes)
- Ability to troubleshoot production issues under pressure
- Excellent verbal and written communication skills
- Willingness to work in the US time zone shift
Benefits
- Employees can work remotely
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSPythonShellPowerShellTerraformCloudFormationCI/CDLinuxDockerKubernetes
Soft Skills
troubleshootingcommunicationcollaborationincident managementdocumentation