Tech Stack
AWSCloudGrafanaLinuxPrometheusPythonSplunk
About the role
- Automate environment lifecycle
- Establish service level objectives (SLOs)
- Monitor environment health and performance
- Manage incident response
- Minimize toil
- Drive continuous improvement
- Balance reliability and speed
- Instil a reliability culture
- Capacity planning
- Advance test data management
Requirements
- 15+ years of experience
- Proficiency with monitoring and logging tools (e.g., Prometheus, Splunk, Grafana)
- Deep understanding of cloud platforms like AWS
- Strong scripting skills in languages such as Python or Bash
- Solid understanding of Linux systems, networking concepts, and database management
- Flexible working arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
monitoring toolslogging toolscloud platformsAWSscripting skillsPythonBashLinux systemsnetworking conceptsdatabase management
Soft skills
incident responsecontinuous improvementreliability culturecapacity planningservice level objectivesenvironment health monitoringperformance monitoringtoil minimizationbalancing reliability and speedtest data management