
Senior Site Reliability Engineer
ZipLiens
full-time
Posted on:
Location Type: Hybrid
Location: Franklin • Tennessee • United States
Visit company websiteExplore more
Salary
💰 $113,000 - $153,000 per year
Job Level
About the role
- Maintain and improve the availability, performance, and reliability of production and non-production environments.
- Proactively identify scalability and capacity risks and recommend mitigation strategies as platform demands grow.
- Enhance system observability through monitoring, logging, and alerting, and help define reliability metrics as systems scale.
- Lead incident investigations and drive root cause analysis, ensuring systemic improvements are implemented.
- Shape and evolve reliability standards and practices while remaining directly engaged in hands-on system improvements.
- Build, own, and continuously improve CI/CD pipelines to support reliable, repeatable deployments.
- Drive automation of infrastructure provisioning, configuration, and operational workflows to reduce manual effort and operational risk.
- Develop and implement tooling that improves system performance, observability, and deployment confidence.
- Partner with software engineers to standardize and improve deployment practices, release processes, and operational readiness across services.
- Establish and enforce best practices for access controls, secrets management, and system hardening.
- Ensure backup, recovery, and disaster-readiness strategies are tested and reliable.
- Partner with engineering leadership on security reviews and compliance-related initiatives.
- Proactively identify and mitigate infrastructure and operational risks.
Requirements
- 7+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or a related role.
- Strong troubleshooting skills with experience leading incident response efforts and driving systemic remediation improvements in production environments.
- Strong experience scaling and operating cloud-based production systems (AWS, GCP, or Azure).
- Experience designing and maintaining CI/CD pipelines and deployment automation.
- Experience with monitoring, logging, and alerting systems for reliability and performance.
- Strong understanding of cloud security fundamentals, including access controls, secrets management, and backup strategies.
- Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash).
- Working knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation) and containerization/orchestration technologies (Docker, Kubernetes).
- Strong written and verbal communication skills and experience collaborating with cross-functional teams.
- Ability to work on-site at least three days per week (approximately 60%) in our Franklin, TN office.
Benefits
- Private Health Care Plan (Medical, Dental & Vision)
- Company HSA contributions for HDHP participants
- Flexible Spending Accounts (Health & Dependent Care)
- Company-Paid Short-Term Disability Coverage
- Voluntary Long-Term Disability, Life, AD&D, and Supplemental Coverage Options
- 401(k) Plan with Company Match
- Paid Time Off (Vacation, Sick Time & Select Holidays)
- Paid Parental Leave
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsInfrastructure Engineeringtroubleshootingcloud-based production systemsCI/CD pipelinesdeployment automationmonitoring systemslogging systemsalerting systems
Soft Skills
strong written communicationstrong verbal communicationcollaborationincident response leadershipsystemic remediationproactive identification of riskscross-functional teamwork