Senior Site Reliability Engineer – Cloud and Data Center Services

Bank of America

full-time

Posted on: 10/19/2025

Location Type: Office

Location: Jersey City • New Jersey • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $152,600 - $197,900 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudConsulDistributed SystemsDNSGoGoogle Cloud PlatformGrafanaJavaJenkinsLinuxOpenShiftPrometheusPythonShell ScriptingTerraform

About the role

Responsible for reliability and support of Foundational Services Platforms and Tools oriented for both on-premises and external clouds (Azure / AWS / GCP)
Design and build the solutions for non-functional requirements of the platforms including monitoring and resiliency
Proactively monitor and troubleshoot environment performance issues, connectivity issues, security issues, etc.
Perform deep dives into systemic and latent reliability issues, incident management, problem management
Identify, analyze, and resolve infrastructure vulnerabilities and application deployment issues.
Perform blameless RCA, partner with product engineering and operations teams across the organization to establish sustainable fixes
Responsible for application onboarding and provide troubleshooting support through the lifecycle of the tools and platforms
Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence
Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities
Be a key stakeholder in the design of cloud services and collaborate with architecture, engineering, operations and product teams
Participate in 24x7 on-call coverage providing L3 platform support, including maintaining the schedule for other personnel

Requirements

BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience
Minimum 5+ years of hands-on experience supporting Site Reliability Engineering, DevOps, or Infrastructure roles
Experience with Python, Ansible, Golang, Java and shell scripting
Certification/Expertise in OpenShift architecture, operations, and container orchestration.
Certification/Deep experience with Terraform and Terraform Enterprise (TFE), including Infrastructure as Code writing
Certification/Solid understanding of Consul for service discovery and key-value configuration
Proven track record of building automation in complex environments
Familiarity with monitoring/observability tools (Prometheus, Grafana, ELK/EFK stacks, etc.)
Experience in performance, integration, and chaos testing of distributed systems
Solid knowledge of networking, security, and Linux internals
Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions
Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
Experience with CI/CD tools git /Jenkins, GitOps model
Excellent understanding of Linux /Windows operating systems administration
Experience in vulnerability remediation
Systematic problem-solving approach, sense of ownership and drive
Ability to juggle competing priorities and adapt to changes in project scope
Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.

Benefits

Access to paid time off
Resources and support to contribute to sustainable growth of business and communities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

PythonAnsibleGolangJavashell scriptingOpenShiftTerraformConsulmonitoring toolsLinux

Soft skills

problem-solvingownershipadaptabilityinterpersonal skillsorganizational skillscommunication skillsteamworkindependence

Certifications

OpenShift architectureTerraformTerraform Enterpriseservice discoveryvulnerability remediation