
Site Reliability Engineer – Transport
Leidos
full-time
Posted on:
Location Type: Hybrid
Location: Honolulu • California • District of Columbia • United States
Visit company websiteExplore more
Salary
💰 $87,100 - $157,450 per year
Tech Stack
About the role
- Work alongside the development and operations teams to ensure speedy and reliable software deployments, monitor systems, and improve overall reliability of the platform
- As you discover and document system bugs, you have the motivation to go off and fix them yourself
- Develop features utilizing the AI coding tool and repository of scripts to automate, scale, test, and secure the cloud infrastructure and the pipelines
- Enhance performance monitoring of the various systems via Splunk or other dashboard reporting tools
- Identify performance bottlenecks and optimize the performance of cloud infrastructure
- Contribute to continuing our SRE journey by suggesting ways to improve engineering build, maintenance, automation and reliability across the platform with SRE/DevOps tools and Infrastructure-as-Code
- Develop and code high-quality pipeline automation workflows to support inside and outside the cloud platform that are appropriate for business and technology strategies
- Develop and execute test strategies that simulate real-world failure scenarios, including network disruptions, hardware failures, and system overloads
- Create, script, and run performance tests to measure system behavior under varying levels of load and traffic
- Identify bottlenecks, performance degradation, and areas for optimization
- Design, implement, and maintain automated test suites for infrastructure and application components
- Ensure that testing is integrated into the CI/CD pipeline to validate system reliability with every release
- Build automated systems for continuous performance testing, stress testing, and load testing
- Work closely with SREs, developers, and operations teams to define reliability goals and develop appropriate testing strategies to validate those goals
- Ensure that new services and features undergo thorough testing for performance, reliability, and failure recovery before deployment to production
- Validate that monitoring, logging, and alerting mechanisms are functioning correctly by testing systems under failure conditions
- Ensure that Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are accurately measured and tracked through automated testing frameworks
- Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management
Requirements
- Typically requires Bachelor’s; however, 4 – 8 years of prior relevant experience may be considered in lieu of degree
- Currently possessing and ability to maintain an active DoD Secret security clearance
- Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract
- 5+ years’ experience configuring Cisco routers, switches, and network appliances
- 5+ years’ experience with routing protocols (i.e., OSPF/EIGRP/BGP)
- 5+ years’ experience with L2 switching (i.e., Vlans, spanning tree, VTP etc.)
- 5+ years’ experience troubleshooting complex routing and switching issues
- Experience with multiple vendor routing, switching or wireless product lines
- Strong understanding and in-depth knowledge of TCP/IP network/subnet addressing
- Ability to work independently or in a team environment to resolve technical issues in a dynamic environment
- Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
- Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
- Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
- Good command of Linux/Unix and command line knowledge
- Experience in application administration, configuration, and integration
- Familiarity with agile development methodologies
- Skilled and disciplined to work with a distributed team
- Ability to work in a highly collaborative, forward thinking, and innovation-driven environment
- Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
- Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
- Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations
- Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)
- Experience in automating tasks with scripting languages like PowerShell, or Python
- Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab
- Experience with PaaS using Red Hat OpenShift/Kubernetes and Docker containers
- Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
- Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies
- Working knowledge of the Risk Management Framework (RMF), DISA STIGs
Benefits
- Health and Wellness programs
- Income Protection
- Paid Leave
- Retirement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AI coding toolpipeline automationperformance testingautomated test suitesCI/CDCisco routersrouting protocolsL2 switchingscripting (bash, python)containerization (Docker)
Soft Skills
independent workteam collaborationproblem-solvingcommunicationinnovation-driven mindsetdynamic environment adaptabilityconflict resolutionforward thinkingattention to detailmotivation
Certifications
Bachelor's degreeDoD Secret security clearanceDoD 8570.01 IAT Level II Certification