TalentWerx

Site Reliability Engineer IV

TalentWerx

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $118,485 - $144,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSCloudCyber SecurityLinuxMicroservicesPythonTerraform

About the role

  • Design, implement, and maintain systems with high availability, fault tolerance, and disaster recovery capabilities.
  • Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to balance reliability with innovation.
  • Develop observability solutions (logging, monitoring, tracing, alerting) to proactively detect anomalies and mitigate risks.
  • Lead incident response efforts, perform root cause analysis, and conduct blameless postmortems to drive continuous improvement.
  • Automate system deployments, configuration management, and operational tasks to reduce manual intervention and human error.
  • Build self-healing and auto-scaling solutions that adapt to mission demands while maintaining compliance with DoD cybersecurity requirements.
  • Implement, validate, and maintain cybersecurity controls aligned with DoD 8140/8570, RMF, and NIST 800-53 standards.
  • Perform vulnerability assessments, patch management, and system hardening to safeguard mission systems against evolving threats.
  • Partner with software engineering, DevSecOps, and infrastructure teams to integrate reliability and cybersecurity into the development lifecycle.
  • Support subcontractor and vendor evaluations, ensuring compliance with reliability, security, and DoD standards.
  • Analyze system failure data, usage patterns, and mission performance metrics to identify trends and recommend improvements.
  • Contribute to process optimization initiatives, quality improvements, and the adoption of new reliability and security technologies.
  • Ensure all contractual deliverables are met or exceeded to customer satisfaction
  • Complete personal PDP and attend Staff Meeting and Storytime (with camera on)
  • Build productive and positive professional relationships with clients within the program
  • Execute all contract requirements in accordance with contract-specific LCAT and requirements
  • Perform other related duties as assigned

Requirements

  • Clearance: Secret Clearance
  • Education and Years of Experience: Bachelor's degree (or equivalent) with 8-10 years of experience, or a Master’s degree with 6-8 years of experience
  • Demonstrated experience in site reliability engineering, systems engineering, or DevSecOps in secure or defense environments.
  • Strong knowledge of system observability, monitoring, and incident response practices.
  • Familiarity with cloud environments (AWS, DoD IL environments) and container orchestration platforms (AWS ECS).
  • Proficiency in automation tools (Ansible, Terraform, CI/CD pipelines) and scripting languages (Python, Bash, PowerShell).
  • Understanding of RMF, NIST SP 800-53, DISA STIGs, and related DoD cybersecurity frameworks.
  • Security + certification or equivalent