Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Climavision

Senior Site Reliability Engineer, C#, .NET

Climavision

Senior Site Reliability Engineer at Climavision ensuring reliability of weather data services across various environments. Focused on improving operational maturity and handling complex production issues.

Posted 6/23/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $135,000 - $170,000 per yearWebsite

Tech Stack

Tools & technologies
AzureDistributed SystemsKubernetes.NET

About the role

Key responsibilities & impact
  • Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments.
  • Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability.
  • Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis.
  • Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems.
  • Drive multi-replica and multi-cluster high availability across Climavision’s .NET services.
  • Improve reliability and operational maturity of production platform services, including observability, autoscaling, ingress, and distributed storage.
  • Partner with software engineering teams to improve production readiness, resiliency patterns, deployment safety, and operational visibility before services reach production.
  • Support and evolve Climavision’s observability platform, including metrics, logging, distributed tracing, dashboarding, and alerting.

Requirements

What you’ll need
  • A bachelor’s degree in computer science, software engineering, or a related field; equivalent professional experience considered.
  • Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role, with at least 4 years in a role formally titled Site Reliability Engineer or carrying explicit SLO / error-budget accountability.
  • Strong, hands-on software engineering experience with a minimum of 3 years of experience supporting and modifying C# / .NET applications in production environments.
  • Demonstrated experience refactoring production application code (preferably C# / .NET) to make services horizontally scalable across multiple replicas.
  • Experience designing or operating multi-cluster high-availability architectures, including failover behavior, traffic routing, and cross-cluster service deployment.
  • Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments.
  • Experience diagnosing and resolving production incidents across application, platform and Kubernetes infrastructure layers, including workload scheduling, storage, ingress, and cluster-level failures.
  • Strong written and verbal communication skills, including incident documentation and postmortem authoring.

Benefits

Comp & perks
  • Competitive compensation
  • Comprehensive benefits package
  • 401(k) Savings Plan
  • Medical/Dental/Vision Benefits
  • Health Savings Account (HSA) and Flexible Spending Account (FSA)
  • Unlimited Paid Time-off
  • 11 Paid Holidays
  • Paid Parental Leave
  • Company Paid Short-term Disability (STD)
  • Company Paid Long-term Disability (LTD)
  • Company Paid Life Insurance

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
C#.NETKubernetesAzureobservabilitydistributed systemshigh availabilityproduction engineeringDevOpssite reliability engineering
Soft Skills
communicationtroubleshootingincident responsepostmortem analysiscollaborationproblem-solvingdocumentationresiliency patternsoperational visibilitydeployment safety