
Senior Site Reliability Engineer
Bank of America
full-time
Posted on:
Location Type: Hybrid
Location: Jersey City • New Jersey • United States
Visit company websiteExplore more
Salary
💰 $152,600 - $191,500 per year
Job Level
About the role
- Designs solutions to visualize key production support metrics enabling Operational Readiness and Site Reliability Engineer teams to identify scenarios requiring intervention.
- Develops software solutions and/or improved processes to address work identified as ‘toil’ by collaborating with key partners to identify, track and remediate processes to free time allocated to reliability.
- Partners with Development and Infrastructure teams to create error budget policies prioritizing reliability stories that fall below Service Level Objective (SLO) thresholds and suggests code optimizations, additional instrumentation and/or logging structures to gain service reliability visibility.
- Identifies and plans for capacity bottlenecks, vulnerabilities and opportunities for reliability improvement, such as low level error rates and 'noise', and reduces manual support effort and/or improves system reliability.
- Assesses monitoring for new changes with development partners and works with monitoring tools team to monitor dashboards and enhance application and system monitoring designs.
- Collaborates with Development and Infrastructure teams to understand technical solutions and develop Service Level Indicators and SLOs to measure/improve the reliability of the services they support.
Requirements
- Eight plus years strong knowledge of Linux/Unix systems and command line tools.
- Proficiency in scripting languages such as Python, Shell, or Perl.
- Experience with APM tools such as DynaTrace.
- Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
- Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
- Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
- Knowledge in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk.
- Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Strong attention to detail and ability to work in a fast-paced, dynamic environment.
Benefits
- Employees are eligible for an annual discretionary award based on their overall individual performance results and behaviors.
- Access to paid time off.
- Industry-leading benefits.
- Support to our employees so they can make a genuine impact and contribute to the sustainable growth of our business and the communities we serve.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LinuxUnixPythonShellPerlDynaTraceAWSAzureGoogle CloudDocker
Soft Skills
problem-solvingtroubleshootingcommunicationcollaborationattention to detail