Bank of America

Senior Site Reliability Engineer – Cloud and Data Center Services

Bank of America

full-time

Posted on:

Location Type: Office

Location: Jersey City • New Jersey • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $152,600 - $197,900 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudConsulDistributed SystemsDNSGoGoogle Cloud PlatformGrafanaJavaJenkinsLinuxOpenShiftPrometheusPythonShell ScriptingTerraform

About the role

  • Responsible for reliability and support of Foundational Services Platforms and Tools oriented for both on-premises and external clouds (Azure / AWS / GCP)
  • Design and build the solutions for non-functional requirements of the platforms including monitoring and resiliency
  • Proactively monitor and troubleshoot environment performance issues, connectivity issues, security issues, etc.
  • Perform deep dives into systemic and latent reliability issues, incident management, problem management
  • Identify, analyze, and resolve infrastructure vulnerabilities and application deployment issues.
  • Perform blameless RCA, partner with product engineering and operations teams across the organization to establish sustainable fixes
  • Responsible for application onboarding and provide troubleshooting support through the lifecycle of the tools and platforms
  • Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence
  • Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities
  • Be a key stakeholder in the design of cloud services and collaborate with architecture, engineering, operations and product teams
  • Participate in 24x7 on-call coverage providing L3 platform support, including maintaining the schedule for other personnel

Requirements

  • BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience
  • Minimum 5+ years of hands-on experience supporting Site Reliability Engineering, DevOps, or Infrastructure roles
  • Experience with Python, Ansible, Golang, Java and shell scripting
  • Certification/Expertise in OpenShift architecture, operations, and container orchestration.
  • Certification/Deep experience with Terraform and Terraform Enterprise (TFE), including Infrastructure as Code writing
  • Certification/Solid understanding of Consul for service discovery and key-value configuration
  • Proven track record of building automation in complex environments
  • Familiarity with monitoring/observability tools (Prometheus, Grafana, ELK/EFK stacks, etc.)
  • Experience in performance, integration, and chaos testing of distributed systems
  • Solid knowledge of networking, security, and Linux internals
  • Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions
  • Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
  • Experience with CI/CD tools git /Jenkins, GitOps model
  • Excellent understanding of Linux /Windows operating systems administration
  • Experience in vulnerability remediation
  • Systematic problem-solving approach, sense of ownership and drive
  • Ability to juggle competing priorities and adapt to changes in project scope
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
Benefits
  • Access to paid time off
  • Resources and support to contribute to sustainable growth of business and communities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonAnsibleGolangJavashell scriptingOpenShiftTerraformConsulmonitoring toolsLinux
Soft skills
problem-solvingownershipadaptabilityinterpersonal skillsorganizational skillscommunication skillsteamworkindependence
Certifications
OpenShift architectureTerraformTerraform Enterpriseservice discoveryvulnerability remediation