
Site Reliability Engineer
DTEX Systems
full-time
Posted on:
Location Type: Remote
Location: United Kingdom
Visit company websiteExplore more
Tech Stack
About the role
- Design, write, and maintain software, primarily in Python, to automate the provisioning, deployment, and configuration management of our infrastructure
- Contribute to the adoption and maturation of Terraform, establishing and maintaining best practices for state management, modularization, and version control.
- Utilize Ansible and/or Saltstack to ensure consistency, repeatability, and standardization across all environments.
- Develop robust CI/CD pipelines for both infrastructure and application deployments, replacing manual processes.
- Implement and mature monitoring, logging, and alerting systems to proactively improve system reliability.
- Participate in a “follow the sun” on-call rotation, focusing on sustainable incident response, blameless postmortems, and driving continuous improvement.
- Champion SRE principles, automation, and coding best practices within the team and across the organization.
Requirements
- 3+ years of hands-on experience managing production environments in AWS and/or GCP
- Strong proficiency in Python
- Demonstrated ability to write clean, maintainable, and testable code to solve infrastructure problems
- Experience with Terraform, including best practices for state management and modular design in complex environments
- Strong knowledge of Linux internals and high competency in Bash scripting and command-line operations
- Proficiency with Ansible and/or Saltstack as configuration management tools
- Expert level understanding of Git and collaborative workflows, such as branching strategies and code review best practices
- Proven track record of transitioning legacy/manual operations environments to automated, IaC-driven approaches
- Experience with containerization in the context of Docker or Kubernetes, and how container orchestration is used in modern systems
- Experience building and managing CI/CD pipelines for infrastructure automation
- Familiarity with Zabbix, Prometheus, Grafana and other tools
- Experience operating and querying Opensearch/Elasticsearch
- A strong desire to solve complex problems, the resilience to work through significant technical debt, and enthusiasm for driving cultural and technical change.
- A desire to work in enterprise and government focused computing environments with robust security and reliability requirements.
- MS/BS in Computer Science/Computer Engineering or related field of study (or equivalent experience)
Benefits
- Fully remote company
- Comprehensive health, vision, and dental coverage
- Flexible time off
- Company computer hardware of your choice
- Work from home setup reimbursement
- Health & wellness perks including Virtual events, happy hours, trivia, and fun
- Monthly Internet & Phone Reimbursement
- Opportunities to learn and grow
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonTerraformAnsibleSaltstackCI/CDLinuxBash scriptingGitDockerKubernetes
Soft skills
problem solvingresiliencecontinuous improvementcollaborationcommunication
Certifications
MS in Computer ScienceBS in Computer Engineering