
Site Reliability Engineer
SS&C Technologies
full-time
Posted on:
Location Type: Remote
Location: Remote • Arizona, Florida, North Carolina, Tennessee • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSCloudKubernetesOpenShiftOpenStackPrometheusSplunkVMware
About the role
- Collaborate with Technology Infrastructure teams to build and operate reusable, cloud-native platforms that abstract complexity and accelerate delivery while incorporating reliability from design through operations.
- Work with business units and technical teams to improve application availability, observability, and reliability as our business applications are migrated to the Private Cloud.
- Enhance platform reliability through automatic problem detection, self-healing systems, and well-architected notification and escalation protocols.
- Use SLOs, SLIs, and KPIs to guide prioritization, measure impact, and drive continuous improvement.
- Eliminate toil using intelligent automation and agentic workflows.
- Conduct blameless retrospectives and share learnings across the organization.
- Foster a culture of ownership, positive thinking, and continuous learning while remaining grounded in practicality, experimentation, and engineering excellence.
- Integrate DevSecOps, zero-trust principles, and policy-as-code into every pipeline.
- Produce and promote Architecture Decision Records (ADRs) and Cloud Well-Architected Frameworks that our business units can adopt to improve our technology standardization.
- Maintain 24x5 active coverage with seamless regional handoffs and weekend escalation protocols.
Requirements
- 5 + years of professional experience in a SRE role
- Minimum Bachelor’s degree in Computer Science, Engineering, or a related field.
- Proven expertise in architecting, designing and operating private cloud environments (e.g., VMware, OpenStack, OpenShift Virtualization) and Kubernetes clusters from a micro to a global scale.
- Hands-on experience with building, deploying, and operating infrastructure as code platforms, CI/CD pipelines, and observability platforms (e.g., Prometheus, Splunk).
- Strong understanding of modern systems reliability standards and practices, including establishing KPIs, monitoring and reporting on SLAs and SLOs, and sorting through the noise to establish actionable insights.
- Familiarity with various financial services regulatory frameworks and their impact on infrastructure design and operations.
- Familiarity with structured naming conventions and asset management for global infrastructure.
- Experience with financial-grade network segmentation, micro-segmentation, and zero-trust architecture.
- Certifications such as TOGAF, AWS Certified Solutions Architect, VMware VCP, or Red Hat Certified Architect are a plus.
- Familiarity with ISO 27001, NIST 800-53, and other security frameworks is a plus.
Benefits
- Flexibility: Hybrid Work Model & a Business Casual Dress Code, including jeans
- 401k Matching Program, Professional Development Reimbursement
- Flexible Personal/Vacation Time Off, Sick Leave, Paid Holidays
- Medical, Dental, Vision, Employee Assistance Program, Parental Leave
- Discounts on fitness clubs, travel and more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
site reliability engineeringprivate cloud architectureKubernetesinfrastructure as codeCI/CD pipelinesobservability platformsautomationproblem detectionself-healing systemsmonitoring and reporting
Soft skills
collaborationcontinuous improvementownershippositive thinkingcontinuous learningpracticalityexperimentationengineering excellenceblameless retrospectivescommunication
Certifications
TOGAFAWS Certified Solutions ArchitectVMware VCPRed Hat Certified Architect