FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Cloud Reliability Manager
Boeing. Own strategy, roadmap, and delivery for Runtime SRE and Cloud Operations to meet enterprise Service Level Objectives (SLOs) and operational Service-Level Agreements (SLAs) .
Posted 5/21/2026full-timeSeattle • California, Illinois, Montana, Washington • 🇺🇸 United StatesMid-LevelSenior💰 $161,500 - $233,450 per yearWebsite
Tech Stack
Tools & technologiesCloudElasticSearchGrafanaKubernetesLogstashPrometheusTerraform
About the role
Key responsibilities & impact- Own strategy, roadmap, and delivery for Runtime SRE and Cloud Operations to meet enterprise Service Level Objectives (SLOs) and operational Service-Level Agreements (SLAs)
- Lead, mentor, and grow teams responsible for runtime SRE (SLOs/SLIs, observability, performance engineering, Disaster Recovery (DR), chaos testing) and Cloud Operations
- Establish and own incident management processes: detection, escalation, incident command, post-incident reviews, and remediation planning; ensure rapid detection and reduced Mean Time to Repair (MTTR)
- Drive observability and telemetry strategy (metrics, tracing, logs) to ensure actionable alerts and proactive detection of platform issues
- Lead capacity planning, performance tuning, and disaster recovery orchestration for platform services and multi-cluster fleets
- Convert Root Cause Analysis (RCA) outcomes into prioritized engineering work
- Define and measure operational Key Performance Indicator (KPIs) and implement automation to reduce manual toil
- Own on-call and rotation policies, runbook quality, bridge setup SLAs, and operational playbooks; ensure teams are trained and drills executed regularly
- Ensure security, compliance, and change management controls are integrated into operational procedures and emergency responses
Requirements
What you’ll need- 5+ years in cloud operations, SRE, and/or related roles
- 3+ years managing technical teams with on-call responsibilities
- 3+ years of experience with Kubernetes at scale and multi-cloud runtime platforms (EKS/AKS/GKE)
- 3+ years of experience with observability tooling (Prometheus, Grafana, OpenTelemetry, Elasticsearch, Logstash, Kibana (ELK), Fluentd, Kibana (EFK), tracing) and alerting design
- Experience owning incident response and improving reliability metrics in production environments
- Experience with capacity planning, performance engineering, and disaster recovery at cloud scale
- Experience with automation tooling (Terraform, CI/CD, operators) and integrating reliability into IaC pipelines
Benefits
Comp & perks- health insurance
- flexible spending accounts
- health savings accounts
- retirement savings plans
- life and disability insurance programs
- paid and unpaid time away from work
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud operationssite reliability engineering (SRE)incident managementcapacity planningperformance tuningdisaster recovery (DR)root cause analysis (RCA)automationobservabilityalerting design
Soft Skills
leadershipmentoringteam managementcommunicationincident commandproblem-solvingstrategic planningtrainingcollaborationorganizational skills