FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer
TernSite Reliability Engineer overseeing infrastructure migration from Heroku to GCP at Tern. Ensuring production reliability and operational excellence in a rapidly growing travel tech company.
Posted 6/4/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $175,000 - $200,000 per yearWebsite
Tech Stack
Tools & technologiesBigQueryCloudDistributed SystemsGoogle Cloud PlatformHerokuPostgres
About the role
Key responsibilities & impact- Own the migration from Heroku to Google Cloud Platform, architecture, execution, and a cutover that doesn't surprise anyone
- Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
- Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
- Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
- Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
- Set the operational bar across engineering and pull others up to it
Requirements
What you’ll need- Production reliability ownership: Track record of personally owning production reliability at meaningful scale. Concrete stories of incidents you led, fixed, and prevented from recurring, not just participated in. This is a primary responsibility, not something you've done on the side.
- Infrastructure migrations: Real experience owning a cloud migration end to end, not just contributing to one. Fluent in GCP (or a comparable cloud), infrastructure-as-code, and the failure modes of distributed systems.
- Observability and proactive operations: You build monitoring and alerting that surfaces problems before users find them. You know what to instrument, what to alert on, and what's just noise.
- High agency: You find the highest leverage reliability problems and go fix them without being assigned to them. You don't wait for an outage to justify the work.
- AI in your working habits: Specific examples of how AI has made your debugging, automation, or operational workflows faster or more reliable.
Benefits
Comp & perks- Own the migration from Heroku to Google Cloud Platform
- Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
- Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
- Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
- Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
- Set the operational bar across engineering and pull others up to it
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PostgresFivetranBigQueryinfrastructure-as-codecloud migrationmonitoringalertingincident responsedebuggingautomation
Soft Skills
production reliability ownershiphigh agencyleadershipproblem-solvingproactive operations