
Site Reliability Engineering Manager – Software Engineering, Mainframe
FIS
full-time
Posted on:
Location Type: Hybrid
Location: Jacksonville • Florida • Wisconsin • United States
Visit company websiteExplore more
Tech Stack
About the role
- Oversee a team of Site Reliability engineers responsible for Identifying automation opportunities and implement tools and processes that streamline routine tasks, enable scalable infrastructure, and support seamless deployments.
- Lead improvement of the reliability and availability of critical applications, platforms, and server infrastructure through proactive monitoring, incident management, and resiliency improvements.
- Guide the team to develop and track new service level indicators to support SLO and SLA compliance.
- Evaluate and interpret monitoring and alerting solutions that improve visibility into infrastructure, application performance, and user experience.
- Formulate and execute strategic initiatives to enhance efficiency, including capacity planning, disaster recovery, and business continuity measures.
- Recommend and implement improvements to disaster recovery plans, backup strategies, and failover mechanisms.
- Ensure ongoing compliance with industry regulations, standards, and best practices, particularly in data security and privacy.
- Maintain up-to-date knowledge of emerging technologies and trends in Site Reliability Engineering, SaaS platform server management and fintech to drive continuous innovation within the team.
- Supervise maintenance, configuration, and reliability of all data center infrastructure, including servers, networks, and storage systems.
- Delivers a production server operations environment that meets all service level agreements, processing service level objectives, response time targets, and availability targets.
- Oversee data security protocols and maintain adherence to regulatory and industry standards.
- Lead incident management processes, ensuring rapid resolution and clear communication with stakeholders.
- Identify and drive improvements in reliability, performance, and efficiency through data and root cause analysis.
- Participate in an on-call rotation to support critical production incidents.
- Strategically manage capacity to support future growth, ensuring the data center adapts to increasing demands without compromising security or performance.
- Partner with cross-functional teams to align data center operations with overall organizational objectives.
- Partner with development, QA, DevOps, and product teams to influence design and drive application resiliency improvements.
- Proactively identify operational risks and develop strategies to mitigate disruptions or data breaches.
- Conduct regular service level reviews to evaluate platform and application performance, and manage a structured feedback loop to identify, track, and resolve recurring technology and application issues.
Requirements
- Extensive experience managing mission critical platforms, applications services, including at least 5 years in a leadership capacity.
- 7-10+ years of management experience in software development life cycle.
- Possesses solid technical knowledge or at least a fundamental grasp of the key principles related to the technologies mentioned below:
- Mainframe Technologies: COBOL, RPG, (Preferred): JCL, CICS, SQL, CL, DDS, DDL, JES, and mainframe environments (AS/400, z/OS) or willingness to learn
- Modern Languages & Frameworks (Required): Java, C#, Python, JavaScript, Spring Boot, Hibernate, JDBC, Angular, Oracle PL/SQL.
- Automation & IaC (Required): Python/Bash/PowerShell scripting, Terraform, Ansible, Jenkins, GitHub, Bitbucket, ServiceNow, Jira, Azure DevOps.
- Monitoring Tools (Preferred): Splunk, Dynatrace, Resolve, Nobl9, JMeter, Zabbix.
- Experience working with Windows, Linux and IBMi operating systems, and administration of applications within these operating systems.
- Comprehensive knowledge of data center architecture and infrastructure components including server topology, networking, storage, and virtualization technologies.
- Proficient in cybersecurity practices and data protection protocols relevant to data center environments.
- Demonstrated ability to lead and motivate teams, coupled with strong communication and interpersonal capabilities.
- Exceptional analytical skills and a commitment to continuous improvement.
- Familiarity with SDLC, CI/CD, as well as DevOps and Site Reliability methodologies.
- Resourceful and proactive in gathering information, resolving challenges, and promoting innovative solutions.
- Excellent strategic thinking and innovation, supported by advanced problem-solving and analytical abilities.
- Effective incident and problem management, including oversight and implementation of permanent solutions.
- Outstanding communication skills and the ability to collaborate effectively with both technical and business stakeholders.
- Well-versed in industry regulations and compliance standards pertinent to data center operations.
- Bachelor’s degree in Computer Science, Information Technology, or a related discipline is required; a Master’s degree is preferred.
Benefits
- A career at FIS is more than just a job. It’s the change to shape the future of fintech.
- Always-on learning and development
- Collaborative work environment
- Opportunities to give back
- Competitive salary and benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
COBOLRPGJCLCICSSQLJavaC#PythonJavaScriptTerraform
Soft Skills
leadershipcommunicationinterpersonalanalyticalstrategic thinkingproblem-solvingteam motivationcontinuous improvementresourcefulnesscollaboration
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Information TechnologyMaster’s degree (preferred)