ZipLiens

Senior Site Reliability Engineer

ZipLiens

full-time

Posted on:

Location Type: Hybrid

Location: FranklinTennesseeUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $113,000 - $153,000 per year

Job Level

About the role

  • Maintain and improve the availability, performance, and reliability of production and non-production environments.
  • Proactively identify scalability and capacity risks and recommend mitigation strategies as platform demands grow.
  • Enhance system observability through monitoring, logging, and alerting, and help define reliability metrics as systems scale.
  • Lead incident investigations and drive root cause analysis, ensuring systemic improvements are implemented.
  • Shape and evolve reliability standards and practices while remaining directly engaged in hands-on system improvements.
  • Build, own, and continuously improve CI/CD pipelines to support reliable, repeatable deployments.
  • Drive automation of infrastructure provisioning, configuration, and operational workflows to reduce manual effort and operational risk.
  • Develop and implement tooling that improves system performance, observability, and deployment confidence.
  • Partner with software engineers to standardize and improve deployment practices, release processes, and operational readiness across services.
  • Establish and enforce best practices for access controls, secrets management, and system hardening.
  • Ensure backup, recovery, and disaster-readiness strategies are tested and reliable.
  • Partner with engineering leadership on security reviews and compliance-related initiatives.
  • Proactively identify and mitigate infrastructure and operational risks.

Requirements

  • 7+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or a related role.
  • Strong troubleshooting skills with experience leading incident response efforts and driving systemic remediation improvements in production environments.
  • Strong experience scaling and operating cloud-based production systems (AWS, GCP, or Azure).
  • Experience designing and maintaining CI/CD pipelines and deployment automation.
  • Experience with monitoring, logging, and alerting systems for reliability and performance.
  • Strong understanding of cloud security fundamentals, including access controls, secrets management, and backup strategies.
  • Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash).
  • Working knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation) and containerization/orchestration technologies (Docker, Kubernetes).
  • Strong written and verbal communication skills and experience collaborating with cross-functional teams.
  • Ability to work on-site at least three days per week (approximately 60%) in our Franklin, TN office.
Benefits
  • Private Health Care Plan (Medical, Dental & Vision)
  • Company HSA contributions for HDHP participants
  • Flexible Spending Accounts (Health & Dependent Care)
  • Company-Paid Short-Term Disability Coverage
  • Voluntary Long-Term Disability, Life, AD&D, and Supplemental Coverage Options
  • 401(k) Plan with Company Match
  • Paid Time Off (Vacation, Sick Time & Select Holidays)
  • Paid Parental Leave
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsInfrastructure Engineeringtroubleshootingcloud-based production systemsCI/CD pipelinesdeployment automationmonitoring systemslogging systemsalerting systems
Soft Skills
strong written communicationstrong verbal communicationcollaborationincident response leadershipsystemic remediationproactive identification of riskscross-functional teamwork