FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff Site Reliability Engineer
AlphaSenseStaff Site Reliability Engineer overseeing reliability, scalability, and performance at AlphaSense, a leading AI-driven market intelligence platform.
Tech Stack
Tools & technologiesAWSAzureCloudDNSGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTCP/IP
About the role
Key responsibilities & impact- Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture.
- Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention.
- Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards.
- Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements.
- Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively.
- Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing.
Requirements
What you’ll need- 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role, with at least 3+ of those years operating in a Senior+ SRE position
- Strong background in running production SaaS systems at scale.
- Proficiency in at least one programming/scripting language (Python, Go, or similar).
- Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes.
- Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing).
- Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK).
- Familiarity with advanced observability (OTEL, continuous profiling).
- Proven incident management experience, including leading high-severity incidents and postmortems.
- Strong troubleshooting skills across the full stack.
- Excellent communication and collaboration skills.
Benefits
Comp & perks- You may also be offered equity
- A generous benefits program
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringDevOpsPythonGoAWSGCPAzureKubernetesTCP/IPmonitoring
Soft Skills
mentorshipcommunicationcollaborationtroubleshootingincident management