
Senior Site Reliability Engineer – Central Platforms
SS&C Technologies
full-time
Posted on:
Location Type: Remote
Location: Colorado • Massachusetts • United States
Visit company websiteExplore more
Salary
💰 $175,000 - $185,000 per year
Job Level
About the role
- Ensure reliability, scalability, and performance of services through SLIs/SLOs, capacity planning, and incident response
- Drive automation of infrastructure operations to minimize toil
- Develop and support monitoring, alerting, and observability systems to support proactive issue detection
- Partner with internal engineering teams to define service-level objectives, improve deployment workflows, and integrate infrastructure with development needs
- Contribute to on-call rotations and incident management, helping ensure high availability of services
- Drive post-incident reviews and blameless retrospectives to improve reliability
- Stay current with emerging technologies and recommend improvements to existing systems and practices.
Requirements
- 3+ years of experience as an SRE, DevOps Engineer, or Infrastructure Engineer
- Solid experience with Kubernetes administration and tooling (e.g., Helm, ArgoCD, Kustomize)
- Strong expertise in cloud platforms (e.g., AWS, GCP, or Azure)
- Experience managing databases in production environments (e.g., backups, replication, tuning)
- Proficiency in programming or scripting (e.g., Go, Python, Bash)
- Deep understanding of CI/CD pipelines and infrastructure automation
- Familiarity with monitoring/observability tools (e.g., Prometheus, Grafana)
- Strong communication skills and ability to collaborate with software engineering teams.
Benefits
- Health insurance
- Dental insurance
- 401k plan
- Tuition reimbursement
- Professional development reimbursement
- Flexible work arrangements
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
KubernetesAWSGCPAzureGoPythonBashCI/CDinfrastructure automationdatabase management
Soft skills
communicationcollaborationincident managementreliability improvementproactive issue detectioncapacity planningblameless retrospectivesteam partnershipautomation driveemerging technology awareness