Senior Site Reliability Engineer

The Leaflet

full-time

Posted on: 2/11/2026

Location Type: Remote

Location: Florida • United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Ansible AWS Azure Cloud Go Google Cloud Platform Grafana Java Kubernetes Prometheus Python Terraform

About the role

Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment.
Troubleshoot and resolve complex issues in production and non-production environments.
Participate in both pre- and post-deployment performance testing and monitoring efforts to improve application performance.
Optimize Java application performance, ensuring efficient resource utilization and scaling.
Deploy and manage the Grafana stack (Grafana, Prometheus, Loki) to provide real-time monitoring, logging, and alerting.
Implement and refine observability strategies to enhance application and infrastructure visibility.
Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance.
Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes of issues to prevent recurrence.
Document and share lessons learned from incidents, contributing to a culture of continuous improvement.
Work closely with developers, architects, and other engineers to design and implement solutions that improve application reliability.
Collaborate closely with DevOps and NOC teams to support the application platform.
Communicate SRE practices and principles to technical and non-technical stakeholders.
Provide feedback and insights on application performance, potential improvements, and observability metrics.

Requirements

Degree in computer science or a related field, or equivalent work experience
5+ years in SRE, DevOps, or similar Infrastructure roles
Experience managing large-scale, high-availability production systems
Track record of incident response and post-mortem processes
Experience with capacity planning and performance optimization
3+ years hands-on experience managing production Kubernetes clusters
Deep understanding of k8s architecture, networking, storage, and security
Experience with cluster scaling (Karpenter), upgrades, and multi-cluster management
Proficiency with kubectl, Helm, and Kubernetes operators
Container orchestration and troubleshooting expertise
Advanced expertise with the Grafana stack for dashboards, alerting, and visualization
Hands-on experience with Grafana Alloy for telemetry data collection
Proficiency in PromQL
Experience with Loki for log aggregation and analysis
Experience building comprehensive monitoring and alerting strategies
Hands-on experience managing Java-based applications in large-scale, distributed environments, with a focus on JVM tuning and application optimization.
Cloud Platform expertise (AWS, GCP, or Azure)
Familiarity with infrastructure as code (IAC) tools like Terraform/Terragrunt or Ansible.
ArgoCD proficiency for GitOps workflows and continuous deployment
Strong scripting abilities in Bash, Python, or Go
Experience with CI/CD pipleines and automation tools
Configuration Management and deployment automation
Strong troubleshooting skills, with a proactive approach to diagnosing and resolving performance bottlenecks.
Proven experience managing on-call rotations, incident response, and root cause analysis.
Ability to mentor junior team members
Strong communication skills (both written and verbal), positive attitude, and ability to receive constructive feedback.

Benefits

Competitive pay and benefits
Flexible vacation allowance
A hybrid / remote working environment
Startup culture backed by a secure, global brand

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

JavaKubernetesGrafanaPrometheusLokiHelmBashPythonGoTerraform

Soft Skills

troubleshootingcommunicationmentoringincident responseroot cause analysiscollaborationcontinuous improvementfeedbackproactive approachpositive attitude