Site Reliability Engineer

Plentiful.ai

full-time

Posted on: 1/21/2026

Location Type: Hybrid

Location: San Francisco • California • United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

AWS Cloud Distributed Systems Postgres

About the role

Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks
Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data
Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres
Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured
Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation
Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers
Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution
Maintain efficient and predictable resource usage across compute, networking and storage
Support security and compliance work including patching, audit readiness and vulnerability management
Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication
Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers

Requirements

5+ years of professional engineering experience in a B2B, SaaS company
Strong experience operating production systems in cloud environments, ideally AWS
Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres
Solid understanding of observability tooling, performance debugging and system behavior under load
A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude
Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment

Benefits

Enjoy unlimited PTO
Fully covered health insurance (medical, dental, and vision)
Meal stipend
Health & wellness stipend
401(k) matching
Stock options

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

observability standardsperformance debuggingserverless compute patternscontainerized servicesdistributed workflowsPostgresincident responseroot cause analysisreliability improvementsresource usage

Soft Skills

high ownership mindsetempathystraightforward communicationteam collaborationbias for actionthoughtful engineering judgment