
Site Reliability Architect
HHAeXchange
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $170,000 - $185,000 per year
About the role
- Architect with a resiliency-by-design intent, for self-healing, fault-tolerant systems, focusing on proactive readiness rather than reactive correction.
- Operate within a secure high-volume, high-volatility application environment, utilizing advanced networking and compute structures, in cloud hosted environments (AWS/GCP).
- Move the organization from "firefighting" to a proactive culture through habits and systems supporting feature flagging, production readiness reviews, architectural decision records, and chaos engineering.
- Support the incident management practice, mentoring SREs and Software engineers alike in utilizing our monitoring and observability toolsets for effective troubleshooting.
- Define SLIs, SLOs, and error budgets that balance feature velocity with platform stability, supporting a shift to service ownership.
- Underscore an automation-first perspective using Terraform, CDK, and other cloud-formation infrastructure as code toolsets to ensure repeatable, audit-ready environments.
Requirements
- Bachelor's or Master's degree in Computer Science, Information Systems, or related field and applicable experience.
- 10 + years in SRE/DevOps with 4 of that in an enterprise SaaS environment.
- 4+ years in software development contributing to a SaaS-based, cloud-hosted product line.
- Proven track record in a distributed SaaS environment managing multi-cloud or multi-region workloads.
- Proficiency in modern cloud networking, including DNS, TCP/IP, Load Balancing, and Zero Trust security models.
- Strong coding skills in Go, Python, Java, C#, or others, to build internal reliability tools and automate complex operational workflows.
- Expert-level knowledge of Kubernetes (EKS/GKE) architecture, including multi-cluster management and stateful workloads.
- Ability to optimize cloud spend while maintaining high performance and reliability.
- Experience operating in a DevSecOps context with compliance guardrails (e.g., GDPR, HIPAA, HITRUST) across varied infrastructures
- Willingness to explore and adopt AI tools responsibly to enhance productivity and innovation in your role
Benefits
- competitive health plans
- paid time-off
- company paid holidays
- 401K retirement program with a Company elected match
- other company sponsored programs
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GoPythonJavaC#KubernetesTerraformCDKcloud networkingDNSTCP/IP
Soft Skills
mentoringproactive culturetroubleshootingautomation-first perspectivefeature flaggingproduction readiness reviewsarchitectural decision recordschaos engineeringincident managementservice ownership
Certifications
Bachelor's degreeMaster's degree