hatch I.T.

Site Reliability Engineer – SRE

hatch I.T.

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Ensure high availability, scalability, and performance of production systems.
  • Implement and maintain SLIs, SLOs, and SLAs for critical services.
  • Conduct capacity planning and performance tuning.
  • Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt, ansible.
  • Develop automation to minimize manual operations and improve deployment workflows.
  • Build CI/CD pipelines to support rapid and reliable deployments.
  • Design and maintain monitoring, logging, and alerting systems (Datadog).
  • Participate in on-call rotations and lead incident response efforts.
  • Perform root-cause analysis and develop postmortems to prevent recurring issues.
  • Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
  • Optimize system architecture for reliability and fault tolerance.
  • Implement best practices for security, networking, and service resilience.
  • Work closely with development teams to design reliable microservices and distributed systems.
  • Advocate for SRE principles and drive operational excellence across engineering teams.
  • Mentor engineers on reliability practices, tooling, and automation strategies.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency with Linux systems and shell scripting.
  • Experience with cloud platforms (AWS, Azure).
  • Hands-on experience with Kubernetes/ECS and container technologies (Docker).
  • Proficiency in at least one programming language: Python or Java.
  • Experience with CI/CD pipelines and DevOps tooling.
  • Strong understanding of distributed systems, networking, and security fundamentals.
  • Strong analytical and problem-solving skills.
  • Excellent communication and cross-team collaboration.
  • Ability to thrive in fast-paced, high-stakes environments.
  • A mindset focused on continuous improvement and operational excellence.
Benefits
  • medical, dental, vision, and a 401k plan with a match to benefit eligible employees.
  • PTO (Personal Time Off) and sick time to full-time employees.
  • good working conditions.
  • a magnificent work environment.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Linux systemsshell scriptingcloud platformsKubernetesECSDockerPythonJavaCI/CD pipelinesinfrastructure as code
Soft Skills
analytical skillsproblem-solving skillscommunicationcross-team collaborationability to thrive in fast-paced environmentscontinuous improvement mindsetoperational excellencementoring
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Engineering