
Senior Site Reliability Engineer – FedRAMP
Climb Channel Solutions NA
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
About the role
- Serve as the primary point of contact for several critical production SaaS applications hosted in Azure, ensuring their availability, performance, and reliability.
- Maintain and support infrastructure within a FedRAMP High authorized environment, ensuring continuous compliance with NIST 800-53 controls and participating in audit readiness activities
- Configure, monitor, troubleshoot, and resolve complex cloud infrastructure and application issues across multiple environments.
- Ensure critical SLAs are met, including participation in an on-call rotation for weekends and emergencies.
- Develop and maintain automation solutions for monitoring, alert mitigation, telemetry, log analysis, and incident response.
- Contribute to security documentation including system security plans, standard operating procedures, and runbooks
- Apply observability best practices to proactively detect and mitigate issues using logging, metrics, tracing, and alerting tools.
- Partner with engineering, security, and product teams to drive reliability improvements and ensure services are built with SRE principles from the ground up.
- Lead and contribute to post-incident reviews, identifying root causes, and implementing preventive actions.
Requirements
- 8+ years of relevant experience in Site Reliability Engineering, DevOps, or Cloud Administration.
- Strong background in integrating, upgrading, securing, and supporting software systems across heterogeneous environments.
- Proven hands-on experience as a Cloud Administrator with Azure, including microservices on AKS (Azure Kubernetes Service), cloud concepts, and cloud security.
- Scripting and programming experience: PowerShell, Python, and markup languages such as XML, JSON, and YAML.
- Infrastructure-as-code expertise with Terraform and Azure DevOps pipelines.
- Knowledge of redundancy, backup, and disaster recovery strategies in cloud environments.
- Hands-on expertise with monitoring and observability tools such as Datadog, Azure Application Insights, Log Analytics
- Strong understanding of networking fundamentals, including firewalls, VLANs, NAT, NACLs, load balancing, VPN tunnels, DNS, DHCP, and packet filtering.
- Direct experience operating in FedRAMP environments, with working knowledge of NIST 800-53 controls, ConMon requirements, and boundary protection
Benefits
- comprehensive life insurance
- healthcare insurance
- pension/retirement matching
- time off plans
- paid company holidays
- meaningful bonus program
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsCloud AdministrationCloud securityScriptingPowerShellPythonInfrastructure-as-codeTerraformAzure DevOps
Soft Skills
communicationproblem-solvingcollaborationleadershipincident responseroot cause analysisaudit readinessproactive detectionpreventive actionsreliability improvements