Tech Stack
AWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformKubernetesLinuxMicroservicesOpen SourcePython
About the role
- Reporting to the Manager of SRE
- Design, implementation, monitor and maintenance of Sysdig's Infrastructure at scale on different clouds and on-prem.
- Collaborate with development teams to improve system reliability, performance, and scalability
- Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures
- Manage cloud infrastructure using Infrastructure as Code practices
- Implement security and data protection best practices and compliance requirements
Requirements
- 3+ years of hands-on experience handling production environments
- Proficiency with cloud platforms (AWS, GCP, IBM or Azure)
- Experience with monitoring and observability tools
- Background in automating operational tasks and reducing toil
- Experience with containerization technologies (Docker, Kubernetes)
- Comfortable writing scripts in Bash, Python, or Go or similar languages and working with Linux and command line interfaces
- Problem-solving mindset focused on automation, prevention, and operational excellence
- Experience with distributed systems and microservices architecture