Bank of America

Senior Site Reliability Engineer

Bank of America

full-time

Posted on:

Location Type: Hybrid

Location: Jersey CityNew JerseyUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,600 - $191,500 per year

Job Level

About the role

  • Designs solutions to visualize key production support metrics enabling Operational Readiness and Site Reliability Engineer teams to identify scenarios requiring intervention.
  • Develops software solutions and/or improved processes to address work identified as ‘toil’ by collaborating with key partners to identify, track and remediate processes to free time allocated to reliability.
  • Partners with Development and Infrastructure teams to create error budget policies prioritizing reliability stories that fall below Service Level Objective (SLO) thresholds and suggests code optimizations, additional instrumentation and/or logging structures to gain service reliability visibility.
  • Identifies and plans for capacity bottlenecks, vulnerabilities and opportunities for reliability improvement, such as low level error rates and 'noise', and reduces manual support effort and/or improves system reliability.
  • Assesses monitoring for new changes with development partners and works with monitoring tools team to monitor dashboards and enhance application and system monitoring designs.
  • Collaborates with Development and Infrastructure teams to understand technical solutions and develop Service Level Indicators and SLOs to measure/improve the reliability of the services they support.

Requirements

  • Eight plus years strong knowledge of Linux/Unix systems and command line tools.
  • Proficiency in scripting languages such as Python, Shell, or Perl.
  • Experience with APM tools such as DynaTrace.
  • Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
  • Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
  • Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
  • Knowledge in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk.
  • Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Strong attention to detail and ability to work in a fast-paced, dynamic environment.
Benefits
  • Employees are eligible for an annual discretionary award based on their overall individual performance results and behaviors.
  • Access to paid time off.
  • Industry-leading benefits.
  • Support to our employees so they can make a genuine impact and contribute to the sustainable growth of our business and the communities we serve.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LinuxUnixPythonShellPerlDynaTraceAWSAzureGoogle CloudDocker
Soft Skills
problem-solvingtroubleshootingcommunicationcollaborationattention to detail