Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tern

Site Reliability Engineer

Tern

Site Reliability Engineer overseeing infrastructure migration from Heroku to GCP at Tern. Ensuring production reliability and operational excellence in a rapidly growing travel tech company.

Posted 6/4/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $175,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies
BigQueryCloudDistributed SystemsGoogle Cloud PlatformHerokuPostgres

About the role

Key responsibilities & impact
  • Own the migration from Heroku to Google Cloud Platform, architecture, execution, and a cutover that doesn't surprise anyone
  • Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
  • Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
  • Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
  • Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
  • Set the operational bar across engineering and pull others up to it

Requirements

What you’ll need
  • Production reliability ownership: Track record of personally owning production reliability at meaningful scale. Concrete stories of incidents you led, fixed, and prevented from recurring, not just participated in. This is a primary responsibility, not something you've done on the side.
  • Infrastructure migrations: Real experience owning a cloud migration end to end, not just contributing to one. Fluent in GCP (or a comparable cloud), infrastructure-as-code, and the failure modes of distributed systems.
  • Observability and proactive operations: You build monitoring and alerting that surfaces problems before users find them. You know what to instrument, what to alert on, and what's just noise.
  • High agency: You find the highest leverage reliability problems and go fix them without being assigned to them. You don't wait for an outage to justify the work.
  • AI in your working habits: Specific examples of how AI has made your debugging, automation, or operational workflows faster or more reliable.

Benefits

Comp & perks
  • Own the migration from Heroku to Google Cloud Platform
  • Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
  • Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
  • Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
  • Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
  • Set the operational bar across engineering and pull others up to it

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PostgresFivetranBigQueryinfrastructure-as-codecloud migrationmonitoringalertingincident responsedebuggingautomation
Soft Skills
production reliability ownershiphigh agencyleadershipproblem-solvingproactive operations