Tech Stack
AWSCloudGrafanaPrometheusSplunk
About the role
- Design and implement observability solutions to provide end-to-end visibility into the health and performance of services running in AWS.
- Develop and maintain monitoring, logging, and alerting systems using AWS native services and third-party tools.
- Collaborate with development, operations, and security teams to define and implement observability best practices.
- Troubleshoot and resolve issues related to service performance, availability, and reliability.
- Create and maintain dashboards and reports to provide real-time insights into system health and performance.
- Conduct root cause analysis of incidents and implement improvements to prevent recurrence.
- Automate observability tasks to improve efficiency and reduce manual effort.
- Stay current with industry trends and emerging technologies related to observability and cloud infrastructure.
- Provide guidance and training to team members on observability tools and practices.
Requirements
- Experience with various root-cause analysis methodologies.
- Proficiency in scripting and automation tools.
- Solid understanding of cloud platforms, especially AWS
- Experience with monitoring and logging tools like Prometheus, Grafana, ELK stack, or Splunk.
- Strong problem-solving skills and the ability to troubleshoot complex system issues
- Competitive and comprehensive benefits package
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
root-cause analysisscriptingautomationcloud platformsAWSmonitoring toolslogging toolsPrometheusGrafanaELK stack
Soft skills
problem-solvingtroubleshootingcollaborationguidancetraining