Tech Stack
AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaPythonRubySplunk
About the role
- Leverage automation to increase reliability, availability, and performance of the infrastructure
- Ensuring the services and infrastructures are reliable, fault-tolerant, efficiently scalable and cost-effective.
- Coordinate Incident, Problem, Release and Change Management
- Manage, debug and troubleshoot cloud infrastructure issues, and tools used to support tasks
- Leverage observability and monitoring to make informed decisions
- Participate in an on-call routine and comfortable working in 24x7 environment
Requirements
- 2-4 years of experience in Cloud Operations Support / SRE
- A critical thinker, resourceful, problem-solver who has a passion for applying technology to make work life better
- Experience with Cloud/SaaS architecture using AWS/GCP platform a must
- Experience with Observability and Monitoring tools (New Relic, Grafana, Dynatrace, etc)
- Experience with Log Parsing tools (Splunk, ElK, etc)
- Experience with one of the programming languages like Ruby, Python, or any object-oriented programming language or scripting skills
- Experience in problem-solving and analyzing global scale distributed systems
- Excellent written and verbal communication skills
- Pioneering Technology
- Collaborative Culture
- Global Impact
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Cloud Operations SupportSite Reliability Engineering (SRE)AWSGCPObservability toolsMonitoring toolsLog Parsing toolsRubyPythonobject-oriented programming
Soft skills
critical thinkingresourcefulnessproblem-solvingcommunication