FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesGoKubernetesPrometheus
About the role
Key responsibilities & impact- Take the lead on scaling our operational resilience as we grow.
- Own the stability, observability, and debugging workflows that keep our systems running smoothly.
- Be the go-to person for untangling complex failures in real time.
- Design tools that turn chaos into clarity, helping us shift from reactive to proactive operations.
- Shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime.
Requirements
What you’ll need- 3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)
- Strong problem-solving skills and ability to dive into unfamiliar backend codebases
- Strong Go and Kubernetes experience.
- Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)
- Clear, calm communication under pressure — especially during live incidents.
Benefits
Comp & perks- Opportunity to work at a high-growth AI startup, backed by top investors.
- Fast Growth - Backed by a16z and YC, on track for double-digit ARR.
- Top-Tier Compensation - Competitive salary + equity in a high-growth startup.
- Ownership & Autonomy - Take full ownership of projects and ship fast.
- Work With the Best - Join a world-class team of engineers and builders.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
debuggingGoKubernetesobservabilitymonitoringproduction systemsbackend codebasesincident managementinternal toolingsystem uptime
Soft Skills
problem-solvingcommunicationcalm under pressureleadershipoperational resilienceproactive operationscomplex failure analysisclarity in chaosdeveloper focusincident reduction
