FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer – SRE
MonstroSite Reliability Engineer managing reliability and observability of a secure, multi-tenant platform on Google Cloud. Hands-on role focusing on incident response and reliability engineering.
Posted 6/9/2026full-timeNew York City • New York • 🇺🇸 United StatesMid-LevelSenior💰 $142,000 - $214,700 per yearWebsite
Tech Stack
Tools & technologiesAWSAzureBigQueryCloudGoGoogle Cloud PlatformKubernetesPython
About the role
Key responsibilities & impact- Define and maintain SLOs and SLIs for our tier-1 services: API gateway, application services, identity, and edge availability
- Build canonical dashboards and alerts in Google Cloud Monitoring, backed by structured logs and BigQuery log analytics
- Tune alert routing so every page is actionable — kill the rest
- Instrument services for distributed tracing and structured logging; push back on services that ship without it
- Own error budgets and use them to prioritize reliability work over feature work when burned
- Reduce toil: automate the top recurring page from the previous quarter
- Maintain runbooks so every page maps to one within a cycle of first occurrence
- First responder for production alerts across monitoring, API gateway, edge defense, and CI
- Triage severity, run the incident bridge, drive mitigation (revision rollback, traffic shift, scaling, edge block, credential rotation)
- Own internal and external incident comms during your shift
- Drive postmortems to closure with action items tracked as audit evidence
- Clean written handoffs at end of shift
Requirements
What you’ll need- Solid production experience on GCP (or comparable AWS/Azure depth with willingness to ramp on GCP fast)
- Comfortable on-call: you’ve run incidents, written postmortems, and shipped the action items
- Strong observability fundamentals: SLOs, log-based metrics, alert hygiene, dashboard discipline
- Working knowledge of Kubernetes, API gateways, identity systems, and at least one IaC tool
- Scripting / coding fluency (Python, Go, Bash) for automation and tooling
- Good written communication — handoffs, postmortems, and runbooks are part of the job
- Bias toward fixing the system, not the symptoms
Benefits
Comp & perks- Competitive salary
- Equity
- Paid health, vision, dental, and disability coverage
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GCPAWSAzureKubernetesAPI gatewaysidentity systemsIaC toolsPythonGoBash
Soft Skills
on-call experienceincident managementwritten communicationpostmortem writingaction item trackingproblem-solvingreliability prioritizationautomation mindsetdashboard disciplinealert hygiene