FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Manager, Site Reliability Engineering
DeepwatchManager of Site Reliability Engineering at Deepwatch leading a high-caliber SRE team. Overseeing cloud architecture and promoting DevOps excellence in a cybersecurity context.
Posted 4/29/2026full-timeTampa • Florida • 🇺🇸 United StatesSeniorLead💰 $178,000 - $213,000 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudCyber SecurityDockerGoogle Cloud PlatformKubernetesPythonTerraform
About the role
Key responsibilities & impact- Lead and grow the SRE team, setting direction, mentoring and managing engineers, and fostering excellence.
- Design and manage cloud and containerized infrastructure with IaC (Terraform).
- Implement robust CI/CD pipelines integrating security and compliance.
- Build scalable observability systems, leading the definition of SLIs / SLOs and dashboards.
- Manage incident response, root cause analysis, and postmortems; automate recovery via playbooks/runbooks.
- Drive capacity planning, performance tuning, and cost efficiency.
- Collaborate with InfoSec, DevSecOps, and Compliance teams—ensuring alignment with frameworks like FedRAMP, NIST, RMF.
- Support program-level initiatives, communicating effectively with stakeholders.
- Promote a culture of reliability, security, and developer efficiency.
- Maintain an active 'player' role, dedicating approximately 75% of your time to hands-on engineering (design, coding, and architecture) and 25% to leadership, mentorship, and management.
Requirements
What you’ll need- 8+ years in SRE, DevOps, or Platform Engineering; with technical leadership experience ready to step into management as a player/coach.
- Proven cloud experience (AWS, GCP) and container orchestration (Kubernetes, Docker).
- Strong coding/scripting (Python, GO) and proficiency in IaC and GitOps.
- Deep knowledge of observability tools and defining reliability metrics.
- Experienced in incident handling (PagerDuty, Datadog) and post-incident evaluations.
- Demonstrated success in mentoring and developing junior/mid-level SRE talent, moving beyond delegation to hands-on technical coaching.
- Familiarity with regulatory or cybersecurity frameworks (FedRAMP, NIST, STIGs, RMF).
- Excellent cross-functional communication and stakeholder management.
- Preferred: certifications such as AWS, CKA, or cyber security credentials (e.g., OSCP).
Benefits
Comp & perks- Medical, dental, vision, and disability insurance
- Flexible Time Off (FTO), 12 company holidays, sick leave and 8-Weeks Paid Parental Leave
- Unique professional development benefits with Annual “development dollars” to support our people growth and development
- Wellness contests and monthly educational programs
- 401(K) retirement program
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud infrastructurecontainer orchestrationIaCCI/CD pipelinesobservability toolscodingscriptingincident handlingreliability metricsperformance tuning
Soft Skills
leadershipmentoringcommunicationstakeholder managementcollaborationcoachingteam managementincident responseproblem-solvingcapacity planning
Certifications
AWSCKAOSCP