
Site Reliability Engineer – Shift Work
qode.world
full-time
Posted on:
Location Type: Remote
Location: Vietnam
Visit company websiteExplore more
About the role
- Participate in on-call rotations(*) to provide support for critical systems.
- Engineers are required to work on a rotating 2-2-2 schedule: 2 morning shifts followed by 2 days off, 2 afternoon shifts followed by 2 days off, and 2 night shifts followed by 2 days off.
- Morning: 09:00 AM - 06:00 PM
- Afternoon: 05:00 PM - 02:00 AM
- Night: 01:00 AM - 10:00 AM
- Resolve system incident when occurs
- Deployment of changes into staging and production environments
- Work with Platform Engineers to understand the changes
- Develop deployment pipeline for changes
- Understand the changes and develop observability (monitoring and alert) according to the changes
- Develop and conduct resiliency testing solution
- Continuous enhancement of monitoring solution
- Create and update operation runbooks
- Automate operation runbooks
Requirements
- Strong experience with Amazon Web Services
- Strong experience and understanding of Kubernetes system
- Scripting skills with Python or Bash
- Experience in continuous deployment tools
- Harness (good to have)
- Experience in infrastructure as code (IaC) tools
- Terraform
- Experience with observability solutions
- Prometheus & Grafana
- SumoLogic (good to have)
- Good in communication and able to communicate fluently in English
- Good problem solving skill
- Self-motivated and able to learn fast
Benefits
- Competitive salary
- 13th-month salary guarantee
- Performance bonus
- Professional English course for employees
- Premium health insurance
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Amazon Web ServicesKubernetesPythonBashcontinuous deploymentinfrastructure as codeTerraformobservability solutionsPrometheusGrafana
Soft Skills
communicationproblem solvingself-motivatedfast learner