Site Reliability Engineer

Tern

Site Reliability Engineer overseeing infrastructure migration from Heroku to GCP at Tern. Ensuring production reliability and operational excellence in a rapidly growing travel tech company.

Posted 6/4/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $175,000 - $200,000 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

PostgresFivetranBigQueryinfrastructure-as-codecloud migrationmonitoringalertingincident responsedebuggingautomation

Soft Skills

production reliability ownershiphigh agencyleadershipproblem-solvingproactive operations

Tools & Technologies

Google Cloud PlatformHerokuHex

Industry Keywords

observabilitydistributed systemsperformance optimizationcost reductionscaling

Tech Stack

Tools & technologies

BigQueryCloudDistributed SystemsGoogle Cloud PlatformHerokuPostgres

About the role

Key responsibilities & impact

Own the migration from Heroku to Google Cloud Platform, architecture, execution, and a cutover that doesn't surprise anyone
Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
Set the operational bar across engineering and pull others up to it

Requirements

What you’ll need

Production reliability ownership: Track record of personally owning production reliability at meaningful scale. Concrete stories of incidents you led, fixed, and prevented from recurring, not just participated in. This is a primary responsibility, not something you've done on the side.
Infrastructure migrations: Real experience owning a cloud migration end to end, not just contributing to one. Fluent in GCP (or a comparable cloud), infrastructure-as-code, and the failure modes of distributed systems.
Observability and proactive operations: You build monitoring and alerting that surfaces problems before users find them. You know what to instrument, what to alert on, and what's just noise.
High agency: You find the highest leverage reliability problems and go fix them without being assigned to them. You don't wait for an outage to justify the work.
AI in your working habits: Specific examples of how AI has made your debugging, automation, or operational workflows faster or more reliable.

Benefits

Comp & perks

Own the migration from Heroku to Google Cloud Platform
Build and maintain the Postgres core, Fivetran pipeline, BigQuery data layer, and Hex reporting infrastructure
Optimize the hot paths that matter most: key backend code paths and our heaviest third-party syncs, so performance holds as volume climbs
Own monitoring, alerting, cost reduction, and proactive scaling: surface problems early, keep spend sane, and stay ahead of growth rather than reacting to it
Lead incident response and write post-mortems that turn an outage into a permanent fix and a smarter team
Set the operational bar across engineering and pull others up to it