Leidos

Site Reliability Engineer – Transport

Leidos

full-time

Posted on:

Location Type: Hybrid

Location: HonoluluCaliforniaDistrict of ColumbiaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $87,100 - $157,450 per year

About the role

  • Work alongside the development and operations teams to ensure speedy and reliable software deployments, monitor systems, and improve overall reliability of the platform
  • As you discover and document system bugs, you have the motivation to go off and fix them yourself
  • Develop features utilizing the AI coding tool and repository of scripts to automate, scale, test, and secure the cloud infrastructure and the pipelines
  • Enhance performance monitoring of the various systems via Splunk or other dashboard reporting tools
  • Identify performance bottlenecks and optimize the performance of cloud infrastructure
  • Contribute to continuing our SRE journey by suggesting ways to improve engineering build, maintenance, automation and reliability across the platform with SRE/DevOps tools and Infrastructure-as-Code
  • Develop and code high-quality pipeline automation workflows to support inside and outside the cloud platform that are appropriate for business and technology strategies
  • Develop and execute test strategies that simulate real-world failure scenarios, including network disruptions, hardware failures, and system overloads
  • Create, script, and run performance tests to measure system behavior under varying levels of load and traffic
  • Identify bottlenecks, performance degradation, and areas for optimization
  • Design, implement, and maintain automated test suites for infrastructure and application components
  • Ensure that testing is integrated into the CI/CD pipeline to validate system reliability with every release
  • Build automated systems for continuous performance testing, stress testing, and load testing
  • Work closely with SREs, developers, and operations teams to define reliability goals and develop appropriate testing strategies to validate those goals
  • Ensure that new services and features undergo thorough testing for performance, reliability, and failure recovery before deployment to production
  • Validate that monitoring, logging, and alerting mechanisms are functioning correctly by testing systems under failure conditions
  • Ensure that Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are accurately measured and tracked through automated testing frameworks
  • Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management

Requirements

  • Typically requires Bachelor’s; however, 4 – 8 years of prior relevant experience may be considered in lieu of degree
  • Currently possessing and ability to maintain an active DoD Secret security clearance
  • Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract
  • 5+ years’ experience configuring Cisco routers, switches, and network appliances
  • 5+ years’ experience with routing protocols (i.e., OSPF/EIGRP/BGP)
  • 5+ years’ experience with L2 switching (i.e., Vlans, spanning tree, VTP etc.)
  • 5+ years’ experience troubleshooting complex routing and switching issues
  • Experience with multiple vendor routing, switching or wireless product lines
  • Strong understanding and in-depth knowledge of TCP/IP network/subnet addressing
  • Ability to work independently or in a team environment to resolve technical issues in a dynamic environment
  • Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
  • Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
  • Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
  • Good command of Linux/Unix and command line knowledge
  • Experience in application administration, configuration, and integration
  • Familiarity with agile development methodologies
  • Skilled and disciplined to work with a distributed team
  • Ability to work in a highly collaborative, forward thinking, and innovation-driven environment
  • Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
  • Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
  • Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations
  • Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)
  • Experience in automating tasks with scripting languages like PowerShell, or Python
  • Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab
  • Experience with PaaS using Red Hat OpenShift/Kubernetes and Docker containers
  • Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
  • Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies
  • Working knowledge of the Risk Management Framework (RMF), DISA STIGs
Benefits
  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI coding toolpipeline automationperformance testingautomated test suitesCI/CDCisco routersrouting protocolsL2 switchingscripting (bash, python)containerization (Docker)
Soft Skills
independent workteam collaborationproblem-solvingcommunicationinnovation-driven mindsetdynamic environment adaptabilityconflict resolutionforward thinkingattention to detailmotivation
Certifications
Bachelor's degreeDoD Secret security clearanceDoD 8570.01 IAT Level II Certification