
Software Engineer, Reliability
Cursor
full-time
Posted on:
Location Type: Office
Location: San Francisco • California • United States
Visit company websiteExplore more
About the role
- Own reliability work end-to-end, from user-facing symptoms (crashes, latency, streaming failures) to root causes in services, infrastructure, or vendor dependencies.
- Design and implement resilience patterns for upstream dependency failures (for example model providers): fallbacks, routing strategies, and degraded-mode designs.
- Build and maintain reliability guardrails that make teams faster and safer: deployment safety, rollbacks, operational playbooks, automated checks, and standards for production readiness.
- Improve observability (metrics, logs, traces, and client telemetry) so engineers can quickly answer 'Is it up?' and 'What changed?'.
- Reduce operational toil through automation and better tooling.
- Partner with product and infrastructure engineering teams as a drop-in reliability multiplier: embed on the highest-impact problems and drive them to a durable technical outcome.
- Participate in an on-call rotation and help improve incident response practices over time (severity definitions, runbooks, retrospectives, and clear ownership of follow-up fixes).
- You will own a small set of high-leverage reliability 'themes' at a time (for example client crash rate, streaming reliability, deploy safety). You drive these end-to-end until the reliability bar measurably moves.
Requirements
- Strong experience owning reliability for production systems, including both incident response and long-term engineering fixes.
- Expert-level experience in at least one of: Go, Node/TypeScript, or Python.
- Deep practical knowledge of cloud infrastructure (AWS) and modern deployment/orchestration patterns (Kubernetes and/or ECS).
- Experience with observability systems and practices (metrics, logs, traces, and alerting).
- Clear communication and cross-team leadership.
Benefits
- Health insurance
- 401(k) matching
- Paid time off
- Flexible work arrangements
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GoNode.jsTypeScriptPythoncloud infrastructureKubernetesECSobservability systemsmetricsalerting
Soft Skills
clear communicationcross-team leadership