OpenFX

Site Reliability Engineer – SRE

OpenFX

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudGrafanaKubernetesPrometheusTerraform

About the role

  • Serve as first responder for production incidents during U.S. operating hours (±2h EST).
  • Lead triage during outages, analyzing logs, metrics, and traces to identify root causes.
  • Drive incident postmortems and follow-ups to prevent recurrence.
  • Communicate clearly and quickly during incidents to internal stakeholders.
  • Own reliability outcomes across all OpenFX systems, with a focus on uptime, latency, and error budgets.
  • Enhance observability through logging, metrics, alerting, and dashboards.
  • Optimize on-call processes and ensure smooth handoffs across IST, EST, and PST coverage.
  • Partner with DevOps and engineering pods to implement fixes or approve production changes.
  • Proactively identify systemic reliability risks and propose improvements.
  • Contribute automation and tooling to reduce manual incident handling.
  • Champion best practices in reliability engineering and operational excellence.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
  • Proven experience leading incident response, running postmortems, and communicating during outages.
  • Strong background with cloud infrastructure (AWS preferred), container orchestration (Kubernetes, ECS), and Infrastructure-as-Code (Terraform, CloudFormation).
  • Familiarity with observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry).
  • Ability to triage errors at both the infrastructure and application level, and escalate effectively when deeper intervention is required.
  • Ownership mindset with strong communication skills in high-pressure situations.
Benefits
  • Competitive salary and benefits package.
  • Equity in a rapidly growing company.
  • Opportunity to work on mission-critical infrastructure in fintech.
  • A collaborative team culture with a bias toward ownership and outcomes.
  • The chance to make a direct impact on the resilience of global financial infrastructure.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Site Reliability EngineeringDevOpsInfrastructure Engineeringincident responsepostmortemscloud infrastructurecontainer orchestrationInfrastructure-as-Codeobservability stackserror triage
Soft skills
communicationownership mindsethigh-pressure situation management
Oddball

DevOps Engineer, Azure

Oddball
Mid · Seniorfull-time$120k–$155k / year🇺🇸 United States
Posted: 34 minutes agoSource: boards.greenhouse.io
AzureCloudDockerJenkinsKubernetesPythonTerraform
Pairing

Senior DevOps Engineer

Pairing
Seniorpart-time🇺🇸 United States
Posted: 7 hours agoSource: jobs.recooty.com
AWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheus
SDL

Senior DevOps Engineer

SDL
Seniorfull-time🇺🇸 United States
Posted: 21 hours agoSource: ats.rippling.com
AzureCloudDNSDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraform
PTC

Cloud Operations Engineer

PTC
Mid · Seniorfull-time$65k–$80k / year🇺🇸 United States
Posted: 22 hours agoSource: ptc.wd1.myworkdayjobs.com
CloudServiceNow