
Cloud DevOps Engineer – Senior Reliability Engineer
General Dynamics Information Technology
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $123,250 - $166,750 per year
Job Level
About the role
- Ensure operational stability, availability, performance, and scalability of cloud-hosted systems across production and development environments supporting multiple agile teams
- Provide real-time monitoring, alerting, incident response, and health checks for infrastructure and applications across all cloud layers (OS, app, DB)
- Implement and maintain dashboards, visualizations, and reports for system health, event management, and cost optimization using native CSP tools
- Manage cloud resource thresholds and automate capacity planning, forecasting, and resource optimization strategies
- Perform incident and event management (SIEM) operations, and support issue diagnosis, resolution, and reporting including RCA documentation
- Track, document, and report monthly issues, including system performance, stability, ticket volumes, and time-to-resolution metrics
- Monitor resource utilization (CPU, memory, disk space) across all deployed VMs, containers, and PaaS components
- Contribute to the implementation of the Enterprise FinOps framework, including forecasting, budget control, and right-sizing analysis
- Support deployment automation and ensure systems are resilient, repeatable, and scalable via Infrastructure as Code (IaC)
- Integrate operations with DevSecOps, MLOps, and CI/CD pipelines for seamless deployment and management
- Execute daily or agreed frequency system health checks and maintain operational Runbooks and SOPs
Requirements
- 5+ years experience in IT system engineering, systems development, systems coding and programming
- Deep expertise with AWS services, including monitoring, logging, compute, storage, and networking
- Proficiency in Infrastructure as Code (IaC) tools like Terraform, AWS CloudFormation
- Hands-on experience with monitoring and APM tools such as CloudWatch, Datadog, Prometheus, Grafana, New Relic, etc.
- Solid understanding of incident response, change management, and ITIL-based operational support
- Familiarity with CI/CD toolchains and automation platforms (Jenkins, GitHub Actions, GitLab, ArgoCD)
- Strong scripting skills (Python, PowerShell, Bash) for automation and orchestration
- Advanced experienced in providing DevSecOps implementation using GitOps, or similar tools
- Experienced in developing, testing, and maintaining containerized applications
- Expert knowledge of source version control, build/release tools and methodologies, CI/CD pipelines and the Software Build process for large enterprises that consists of a large number of complex applications
- Experience with FinOps practices, cost modeling, forecasting, and optimization tools within cloud platforms
- Understanding of federal compliance and security frameworks (e.g., FedRAMP, NIST, JISF Rev 5)
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSInfrastructure as CodeTerraformAWS CloudFormationPythonPowerShellBashCI/CDDevSecOpsFinOps
Soft Skills
incident responsechange managementoperational supportautomationorchestrationproblem-solvingcommunicationcollaborationreportingdocumentation