
Staff Site Reliability Engineer
GE Aerospace
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria
- Develop automated solutions to address potential problems before they result in a service interruption
- Provide impact assessment and mitigation plan for changes going into the production environment
- Investigate root cause of severe and systemic outages, identify corrective actions and apply across the enterprise
- Develop availability measures that align with consumer experience to accurately assess the usability of crucial services
- Build capacity models to baseline transactional load compared to resource performance and leverage data to predict overall system capacity while automating load placement to avoid outages
- Identify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outages
- Analyze failure points in services to model risk level and resolution steps if failure occurs
- Assist in driving architecture enhancements into system to mitigate potential failure points
- Programmatically monitor for and remediate configuration drift of critical devices
- Develop response plans to potential failure points and evaluate effectiveness during planned tests
- Perform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architecture
- Provide technical coaching and direction to more junior teammates
Requirements
- Bachelor’s degree from accredited university or college with minimum of 4 years of professional experience OR Associates degree with minimum of 7 years of professional experience OR High School Diploma with minimum of 9 years of professional experience
- Legal authorization to work in the U.S. is required
- Excellent knowledge of AWS/Azure cloud services
- Strong oral and written communication skills
- Demonstrated experience scripting or developing software and services for the cloud (Python, Go, Java, Node.js, .NET, etc.)
- Extensive knowledge of network protocols (TCP/IP, SNMP, FTP, syslog, TFTP, etc.)
- Experience managing version control systems such as Git
- Experience deploying and managing infrastructure on public clouds such as AWS or Azure
- Experience using an automated configuration management system (Terraform, Chef, Puppet, Ansible, Salt, etc.)
- Strong organizational and project management skills
- Strong analytical and problem resolution skills
- Excellent knowledge of Network Management (SNMP, MIB)
- Experience with configuring, customizing, and extending monitoring tools (Datadog, Sensu, Grafana, Splunk, etc.)
- Excellent knowledge of TCP/IP networking, and inter-networking technologies (routing/switching, proxy, firewall, load balancing, etc.)
- Knowledge and experience using Analytics Software Packages (like Matlab, SAS, JMPro) is a plus.
Benefits
- Great work environment
- Professional development
- Competitive compensation
- Equal Opportunity Employer
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGoJavaNode.js.NETTCP/IPSNMPFTPGitTerraformDatadog
Soft Skills
communication skillsorganizational skillsproject management skillsanalytical skillsproblem resolution skillstechnical coachingleadershipcollaborationcritical thinkingattention to detail