Senior Site Reliability Engineer, C#, .NET

Climavision

Senior Site Reliability Engineer at Climavision ensuring reliability of weather data services across various environments. Focused on improving operational maturity and handling complex production issues.

Posted 6/23/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $135,000 - $170,000 per yearWebsite

Tech Stack

Tools & technologies

AzureDistributed SystemsKubernetes.NET

About the role

Key responsibilities & impact

Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments.
Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability.
Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis.
Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems.
Drive multi-replica and multi-cluster high availability across Climavision’s .NET services.
Improve reliability and operational maturity of production platform services, including observability, autoscaling, ingress, and distributed storage.
Partner with software engineering teams to improve production readiness, resiliency patterns, deployment safety, and operational visibility before services reach production.
Support and evolve Climavision’s observability platform, including metrics, logging, distributed tracing, dashboarding, and alerting.

Requirements

What you’ll need

A bachelor’s degree in computer science, software engineering, or a related field; equivalent professional experience considered.
Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role, with at least 4 years in a role formally titled Site Reliability Engineer or carrying explicit SLO / error-budget accountability.
Strong, hands-on software engineering experience with a minimum of 3 years of experience supporting and modifying C# / .NET applications in production environments.
Demonstrated experience refactoring production application code (preferably C# / .NET) to make services horizontally scalable across multiple replicas.
Experience designing or operating multi-cluster high-availability architectures, including failover behavior, traffic routing, and cross-cluster service deployment.
Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments.
Experience diagnosing and resolving production incidents across application, platform and Kubernetes infrastructure layers, including workload scheduling, storage, ingress, and cluster-level failures.
Strong written and verbal communication skills, including incident documentation and postmortem authoring.

Benefits

Comp & perks

Competitive compensation
Comprehensive benefits package
401(k) Savings Plan
Medical/Dental/Vision Benefits
Health Savings Account (HSA) and Flexible Spending Account (FSA)
Unlimited Paid Time-off
11 Paid Holidays
Paid Parental Leave
Company Paid Short-term Disability (STD)
Company Paid Long-term Disability (LTD)
Company Paid Life Insurance

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

C#.NETKubernetesAzureobservabilitydistributed systemshigh availabilityproduction engineeringDevOpssite reliability engineering

Soft Skills

communicationtroubleshootingincident responsepostmortem analysiscollaborationproblem-solvingdocumentationresiliency patternsoperational visibilitydeployment safety