
Systems Reliability Engineer
Arkenstone Defense
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $140,000 - $160,000 per year
About the role
- Design, implement, and own the infrastructure reliability strategy across AWS, Azure, and GCP
- Champion observability by developing and maintaining effective logging, monitoring, and alerting systems
- Lead efforts in performance tuning, system hardening, capacity planning, and disaster recovery
- Own the incident management lifecycle: from detection to postmortem and root cause analysis
- Automate deployment, scaling, and recovery workflows to reduce manual toil
- Contribute to infrastructure as code (Terraform, ARM templates, CloudFormation, etc.)
- Act as a mentor and technical leader to junior engineers and cross-functional partners.
Requirements
- 5+ years of experience in SRE, DevOps, or infrastructure engineering roles
- Proven track record of operating large-scale systems in multi-cloud environments
- Strong knowledge of cloud-native architecture, container orchestration (e.g., Kubernetes), and CI/CD pipelines
- Proficient in scripting (Python, Bash, etc.) and infrastructure automation tools
- Experience with monitoring/observability platforms (e.g., Prometheus, Grafana, Datadog, ELK, etc.)
- Excellent problem-solving skills and a bias toward ownership and action
- Comfortable making decisions under pressure and leading through incidents
- Working knowledge of FedRAMP or NIST 800-53 controls preferred
- Comfortable participating in customer discussions
- Clear communicator who can translate technical concepts to mixed audiences.
Benefits
- Competitive Salary: Recognizing your hard work with attractive compensation and rewarding excellence.
- Health and Wellness Programs: Including medical, dental, and vision insurance options, along with mental health support and wellness initiatives.
- Retirement Planning: Secure your future with our flexible 401(k) plan and matching company contributions.
- Paid Time Off & Holidays: Generous PTO, sick leave, and holiday pay to help you recharge and enjoy life outside of work.
- Employee Assistance Program: Confidential resources for personal and professional support.
- Professional Development: Access to training, certifications, and continuing education to foster your career growth.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
infrastructure reliabilityperformance tuningsystem hardeningcapacity planningdisaster recoveryinfrastructure as codescriptingcloud-native architecturecontainer orchestrationCI/CD pipelines
Soft Skills
problem-solvingownershipdecision-making under pressureleadershipmentoringcommunication
Certifications
FedRAMPNIST 800-53