
SRE Technical Manager – Transport
Leidos
full-time
Posted on:
Location Type: Office
Location: Norfolk • District of Columbia • Virginia • United States
Visit company websiteExplore more
Salary
💰 $116,350 - $210,325 per year
About the role
- Manage and mentor 5-6 SRE teams (pods) and 60+ FTEs, providing guidance, setting performance expectations, and fostering professional development.
- Work collaboratively with SRE Resource Managers to staff and maintain engineering resources for your SRE vertical teams' reliability and scalability goals.
- Responsible for the P&L across the Transport Services vertical. Manage the SRE team’s resources, including budget planning, tool selection, and infrastructure investments to meet reliability and scalability needs.
- Meet regularly with your team members, participate in performance reviews and interviews, and development planning.
- Oversee the reliability, availability, and performance of critical systems by leading the SRE teams within the data center vertical in implementing monitoring, incident response, and performance optimization strategies.
- Ensure the team adheres to best practices for system reliability, automation, and operational efficiency.
- Drive continuous improvement initiatives by analyzing performance metrics (e.g., SLOs, MTTR, MTBF) and identifying areas for enhancement.
- Collaborate with operations, quality, cybersecurity and other SRE engineering teams to define and enforce Service Level Objectives (SLOs) and manage error budgets.
- Act as a liaison between the SRE team and other departments to prioritize reliability and operational needs in the product development process.
- Collaborate with senior leadership to define the SRE strategy, set long-term reliability goals, and ensure alignment with business objectives.
- Lead efforts to reduce operational toil through automation. Work with the team to build or enhance automation tools that manage infrastructure, monitor systems, and respond to incidents.
- Oversee the development and adoption of Infrastructure as Code (IaC) tools, CI/CD pipelines, and other automation processes.
- Ensure that SRE practices align with organizational security policies and compliance requirements.
- Collaborate with security teams to integrate reliability-focused security practices into the design and operation of systems.
- Ensure systems meet or exceed agreed-upon service levels by proactively addressing potential issues and working with stakeholders to align on reliability expectations.
- Work within a SRE team, collaborating with other Developers, Security, and Operations, to continuously deliver products and increase the value stream for the organization and customers.
- Embrace and champion Agile development processes and adoption to modern Site Reliability Engineering workflows and practices while providing technical guidance to team members and coworkers on best practices.
- Stay up to date on the latest Site Reliability Engineering practices and technologies.
- Strive to provide internal and external customers with excellent customer service and world-class service.
- Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management.
Requirements
- Requires B.S. Degree (or equivalent) in Cybersecurity, Information Security, IT, Network Engineering, Computer Science, or related field or Master's with 6+ years of prior relevant experience with 8-10 years of SRE or DevOps experience and at least 4 years in a leader or manager capacity.
- US Citizen with DoD Secret Clearance.
- Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract.
- Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel).
- Exceptional written and oral communication skills include producing technical analysis/reports, presentations and executive level briefings with internal and external stakeholders.
- Ability to review requirements, comprehend, and solution capabilities that satisfy customer requirements.
- Ability to work in a highly collaborative, forward thinking, and innovation-driven environment.
- Proven experience managing teams responsible for large-scale, distributed systems with high reliability and performance demands.
- Strong track record of managing incidents, conducting postmortems, and implementing reliability improvements.
- Experience implementing and managing Agile or DevOps processes, with a focus on continuous improvement, efficiency, and team productivity.
- Ability to lead teams through strategic initiatives such as reliability maturity assessments, process automation, and tooling selection.
- Solid understanding of SRE principles, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgeting.
- Experience with commercial cloud infrastructure deployment environments such as AWS and Azure.
- Strong knowledge of automation tools, CI/CD pipelines, and Infrastructure as Code (IaC).
- Experience with Agile and DevSecOps/SRE concepts and best practices.
- Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.).
- Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations.
- Solid experience with integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab.
- Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Ansible, or similar technologies.
- Working knowledge of the Risk Management Framework (RMF), DISA STIGs.
Benefits
- Health and Wellness programs
- Income Protection
- Paid Leave
- Retirement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability Engineering (SRE)DevOpsAgile methodologiesInfrastructure as Code (IaC)CI/CD pipelinesAutomation toolsIncident managementPerformance optimizationMonitoring strategiesReliability improvements
Soft Skills
LeadershipMentoringCollaborationCommunicationProblem-solvingStrategic thinkingCustomer serviceConflict resolutionPerformance managementContinuous improvement
Certifications
B.S. Degree in CybersecurityDoD Secret ClearanceDoD 8570.01 IAT Level II Certification