Duck Creek Technologies

Senior Site Reliability Engineer

Duck Creek Technologies

full-time

Posted on:

Location Type: Remote

Location: Australia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Take ownership of maintaining and improving the stability and reliability of our software systems, ensuring minimal downtime and maximum availability
  • Monitor system health, diagnose issues, and implement effective solutions to prevent recurrence
  • Collaborate with engineering teams to optimize the performance and efficiency of our infrastructure
  • Identify performance bottlenecks, conduct capacity planning, and implement optimizations to enhance system scalability and responsiveness
  • Lead incident response efforts during critical incidents, working closely with cross-functional teams to restore service and minimize impact on business operations
  • Conduct post-incident reviews to identify root causes and implement preventive measures
  • Develop and maintain automation scripts, tools, and workflows to streamline operational processes and reduce manual intervention
  • Implement infrastructure as code practices to automate deployment, configuration management, and resource provisioning
  • Champion the adoption of reliability engineering best practices within the organization
  • Implement monitoring, alerting, and observability solutions to proactively detect and respond to system issues
  • Create and maintain documentation, runbooks, and knowledge base articles to capture system configurations, procedures, and troubleshooting guidelines
  • Facilitate knowledge sharing and cross-training to empower team members and improve overall system understanding
  • Perform all other duties and activities as required

Requirements

  • Bachelor's degree or higher or equivalent additional years of experience
  • 6+ years’ experience in a similar function
  • Proficiency in cloud computing platforms such as AWS, Azure, or Google Cloud Platform
  • Strong scripting and programming skills in languages such as Python, Bash, or PowerShell
  • Experience with infrastructure as code tools such as Terraform, Ansible, or Puppet
  • Deep understanding of containerization technologies such as Docker and Kubernetes
  • Excellent problem-solving skills and a proactive approach to troubleshooting and resolution
  • Strong communication and collaboration skills, with the ability to work effectively in a cross-functional environment
Benefits
  • Flexible-First employer
  • Opportunity to work from an office, from home, or on a hybrid schedule
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
cloud computingPythonBashPowerShellTerraformAnsiblePuppetDockerKubernetesinfrastructure as code
Soft Skills
problem-solvingproactive troubleshootingcommunicationcollaborationcross-functional teamwork
Certifications
Bachelor's degree