LMI

Site Reliability Engineer

LMI

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $140,000 - $170,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSAzureCloudCyber SecurityGrafanaPrometheusPythonSplunkTerraform

About the role

  • Monitor the health, performance, and availability of H2FMS applications, services, APIs, and data services in Army GovCloud.
  • Troubleshoot system issues across application, data, and infrastructure layers.
  • Implement reliability patterns such as redundancy, graceful degradation, and failover strategies.
  • Support performance optimization activities based on monitoring metrics and trends.
  • Manage user access controls, role-based permissions, and environment access configurations.
  • Maintain, monitor, and archive system logs, audit logs, and access logs to support RMF and cATO requirements.
  • Support ISSO and Cybersecurity teams in log retrieval, incident investigations, and audit preparation.
  • Develop and maintain automation scripts to improve environment stability, operational workflows, and deployment reliability.
  • Collaborate with DevSecOps engineers to integrate automated runtime checks, monitoring, and health checks within CI/CD pipelines.
  • Assist in implementing automated scaling, alerting, and self-healing mechanisms.
  • Participate in incident response activities, including detection, diagnosis, escalation, mitigation, and documentation.
  • Coordinate with cybersecurity teams during security events or anomalies.
  • Conduct root-cause analysis and contribute to long-term corrective actions.
  • Maintain environment configuration inventories related to access, logging, monitoring, and deployment parameters.
  • Support configuration management, patch activities, and version control for infrastructure and application components.
  • Collaborate with the Cloud Architect on environment design updates and capacity planning.
  • Document system configurations, access processes, log retention procedures, and environment health dashboards.
  • Support the ISSM and ISSO teams in continuous monitoring package updates and RMF documentation.
  • Maintain audit-ready artifacts related to reliability operations and environment management.

Requirements

  • Bachelor’s degree in information technology, Computer Science, Engineering, Cybersecurity, or a related field.
  • 3–6 years of experience in cloud operations, SRE, DevOps, or system administration roles.
  • Hands-on experience with cloud monitoring, logging, and performance management tools (AWS CloudWatch, Azure Monitor, ELK/Splunk, Prometheus/Grafana, etc.).
  • Experience with automation tools (Python, Bash, Terraform, Ansible, etc.).
  • Familiarity with RMF, Zero Trust, and DoW cloud security requirements.
  • Understanding of CI/CD pipelines and deployment processes.
  • Ability to obtain and maintain a DoD Secret clearance.
Benefits
  • Health insurance
  • Work-Life Wellness
  • Career Development

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
cloud operationssite reliability engineeringDevOpssystem administrationautomation scriptingperformance optimizationroot-cause analysisconfiguration managementincident responseenvironment design
Soft skills
troubleshootingcollaborationcommunicationproblem-solvingdocumentation
Certifications
DoD Secret clearance