
Site Reliability Engineer
Plentiful.ai
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Tech Stack
About the role
- Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks
- Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data
- Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres
- Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured
- Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation
- Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers
- Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution
- Maintain efficient and predictable resource usage across compute, networking and storage
- Support security and compliance work including patching, audit readiness and vulnerability management
- Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication
- Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers
Requirements
- 5+ years of professional engineering experience in a B2B, SaaS company
- Strong experience operating production systems in cloud environments, ideally AWS
- Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres
- Solid understanding of observability tooling, performance debugging and system behavior under load
- A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude
- Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment
Benefits
- Enjoy unlimited PTO
- Fully covered health insurance (medical, dental, and vision)
- Meal stipend
- Health & wellness stipend
- 401(k) matching
- Stock options
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
observability standardsperformance debuggingserverless compute patternscontainerized servicesdistributed workflowsPostgresincident responseroot cause analysisreliability improvementsresource usage
Soft Skills
high ownership mindsetempathystraightforward communicationteam collaborationbias for actionthoughtful engineering judgment