Senior Site Reliability Engineer

Transcend

Senior Site Reliability Engineer at Transcend, ensuring high reliability and performance of privacy infrastructure while collaborating across teams.

Posted 4/27/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $170,000 - $185,000 per yearWebsite

Tech Stack

Tools & technologies

AWSCloudJavaScriptPythonTerraformTypeScript

About the role

Key responsibilities & impact

Lead reliability-focused design and readiness reviews for new and existing services, ensuring production readiness, clear rollout and rollback strategies, and strong observability for every launch.
Build, operate, and continuously improve our observability stack (e.g., logging, metrics, tracing) to provide meaningful dashboards, alerts, and runbooks that enable fast, high-quality incident response across engineering teams.
Own and evolve incident management practices, including on-call participation, incident response processes, and post-incident reviews that drive long-term remediation and learning across teams.
Plan and execute disaster recovery exercises and game days to validate our resilience posture, test failover and backup strategies, and systematically reduce single points of failure.
Perform capacity planning and cost optimization for our cloud infrastructure, helping ensure we run a cost-effective environment that meets performance and availability goals as usage grows.
Identify and drive down systemic reliability risks across application, infrastructure, and process layers—owning cross-team projects that significantly reduce incident frequency and severity over time.
Collaborate closely with Developer Experience, Security, and product engineering to embed reliability best practices—testing, rollout patterns, guardrails, and “golden paths”—into shared tools and CI/CD pipelines.
Participate in and help continuously improve the on-call rotation, using real incidents and near-misses to prioritize automation, better alerting, and clearer documentation.

Requirements

What you’ll need

5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or a closely related role, including hands-on ownership of production systems.
Strong experience operating modern cloud infrastructure, ideally on AWS, including core services for compute, networking, storage, and security primitives.
Proficiency with at least one programming language used at Transcend (e.g., JavaScript, Typescript, or Python), and comfort reading and reviewing application code for reliability and performance concerns.
Hands-on experience with infrastructure-as-code and CI/CD tooling (e.g., Terraform, CloudFormation, or similar; modern build/deploy pipelines) to reliably provision and change infrastructure.
Deep familiarity with observability and monitoring systems (e.g., Datadog or equivalent), including designing alerts that balance coverage and noise to avoid alert fatigue while protecting customer experience.
Proven track record running incident response and post-incident analysis, including root cause identification, clear documentation, and driving follow-through on remediation work.
Excellent communication and collaboration skills, with experience working across multiple engineering teams to align on reliability goals, share context, and influence technical direction without formal authority.
Comfort participating in an on-call rotation, and experience helping to design or improve on-call processes, runbooks, and escalation paths.
Minimum level of education: Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field, or equivalent practical experience.
Demonstrated ability to thrive in a remote-first, high-autonomy environment, managing priorities, communicating asynchronously, and driving projects to completion with limited oversight.

Benefits

Comp & perks

Flexible PTO
Parental leave
401(k) match
Competitive compensation packages that include employee equity

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringProduction EngineeringInfrastructure EngineeringAWSJavaScriptTypescriptPythoninfrastructure-as-codeCI/CDobservability

Soft Skills

communicationcollaborationincident responseproblem-solvinginfluence without authorityremote workproject managementprioritizationdocumentationadaptability