
Senior Site Reliability Engineer
Duck Creek Technologies
full-time
Posted on:
Location Type: Remote
Location: Australia
Visit company websiteExplore more
Job Level
About the role
- Take ownership of maintaining and improving the stability and reliability of our software systems, ensuring minimal downtime and maximum availability
- Monitor system health, diagnose issues, and implement effective solutions to prevent recurrence
- Collaborate with engineering teams to optimize the performance and efficiency of our infrastructure
- Identify performance bottlenecks, conduct capacity planning, and implement optimizations to enhance system scalability and responsiveness
- Lead incident response efforts during critical incidents, working closely with cross-functional teams to restore service and minimize impact on business operations
- Conduct post-incident reviews to identify root causes and implement preventive measures
- Develop and maintain automation scripts, tools, and workflows to streamline operational processes and reduce manual intervention
- Implement infrastructure as code practices to automate deployment, configuration management, and resource provisioning
- Champion the adoption of reliability engineering best practices within the organization
- Implement monitoring, alerting, and observability solutions to proactively detect and respond to system issues
- Create and maintain documentation, runbooks, and knowledge base articles to capture system configurations, procedures, and troubleshooting guidelines
- Facilitate knowledge sharing and cross-training to empower team members and improve overall system understanding
- Perform all other duties and activities as required
Requirements
- Bachelor's degree or higher or equivalent additional years of experience
- 6+ years’ experience in a similar function
- Proficiency in cloud computing platforms such as AWS, Azure, or Google Cloud Platform
- Strong scripting and programming skills in languages such as Python, Bash, or PowerShell
- Experience with infrastructure as code tools such as Terraform, Ansible, or Puppet
- Deep understanding of containerization technologies such as Docker and Kubernetes
- Excellent problem-solving skills and a proactive approach to troubleshooting and resolution
- Strong communication and collaboration skills, with the ability to work effectively in a cross-functional environment
Benefits
- Flexible-First employer
- Opportunity to work from an office, from home, or on a hybrid schedule
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud computingPythonBashPowerShellTerraformAnsiblePuppetDockerKubernetesinfrastructure as code
Soft Skills
problem-solvingproactive troubleshootingcommunicationcollaborationcross-functional teamwork
Certifications
Bachelor's degree