Optimal Ways

Staff Software Engineer, Site Reliability, SRE

Optimal Ways

full-time

Posted on:

Location: 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $160,000 - $200,000 per year

Job Level

Lead

Tech Stack

AWSCloudDynamoDBJavaJavaScriptKubernetesPythonSDLCTerraformTypeScript

About the role

  • Reliability: Own the company-wide incident lifecycle; standards for detection, escalation, incident command, customer comms, and high-quality postmortems with action tracking.
  • Define and drive SLIs/SLOs for core services; build guardrails and dashboards that make reliability visible and actionable.
  • Lead production readiness reviews, capacity/performance planning, load testing, disaster recovery exercises, and resilience engineering (failure testing/chaos where appropriate).
  • Level-up on-call: right-sizing rotations, paging hygiene, runbooks, auto-remediation, and continuous improvement of MTTA/MTTR.
  • Security: Embed security into the delivery pipeline: dependency and image scanning, least-privilege/IAM baselines, secrets management, and service-to-service auth.
  • SOC 2-aligned controls as code; audit-friendly evidence generation in everyday engineering.
  • Drive secure-by-default patterns in the platform (network posture, data protection, runtime policies).
  • Platform & DevEx: Build and evolve paved roads for deploys, config, and runtime operations in our monorepo (Bazel) and CI/CD (AWS CodePipeline/CodeBuild).
  • Partner with product teams to make the secure default the easiest path—templates, tooling, libraries, and automation.
  • Improve observability end-to-end (traces, logs, metrics, alerts).

Requirements

  • Experienced: Staff-level IC who has led reliability programs at meaningful scale and owned incident response standards.
  • Technically Grounded: Deep, hands-on experience with infrastructure at scale, cloud, containerization, and more:
  • AWS (multi-service)
  • ECS and/or Kubernetes containerization workloads
  • CICD & IaC (Terraform)
  • Production Networking/Fundamentals
  • Python Proficient: You can read/review service code and land operational improvements.
  • Data Driven: In your approach to SLOs, capacity, performance, and cost efficiency with strong observability chops
  • Influential: Able to shape direction and create simple, durable standards
  • Communicative: Excels in both technical and interpersonal communication, with strong written and verbal skills
  • Nice To Have: FinOps, SOC 2, Data Science/ML collaboration, monorepo frameworks (bazel, buck)
Coates Group

Senior DevOps Engineer

Coates Group
Seniorfull-time$125k–$140k / yearIllinois · 🇺🇸 United States
Posted: 5 hours agoSource: jobs.lever.co
AWSCloudDockerIoTLinuxMicroservicesPython
Eduphoria! Inc.

AWS DevOps Engineer

Eduphoria! Inc.
Mid · Seniorfull-time$110k–$125k / yearFlorida, Illinois, Kansas, Maryland, North Carolina, Ohio, Tennessee, Texas, Virginia · 🇺🇸 United States
Posted: 20 hours agoSource: eduphoria.applytojob.com
AWSAzureCloudEC2LinuxMySQL.NETSQLTerraform
GEICO

DevOps Engineer II – FinTech Commissions, Substantiation

GEICO
Mid · Seniorfull-time$75k–$160k / yearDistrict of Columbia, Maryland, Texas, Virginia · 🇺🇸 United States
Posted: 21 hours agoSource: geico.wd1.myworkdayjobs.com
AWSAzureCloudDistributed SystemsJava.NETNoSQLPythonSQL
ParentSquare

Site Reliability Engineer

ParentSquare
Mid · Seniorfull-time$170k–$200k / year🇺🇸 United States
Posted: 21 hours agoSource: ats.rippling.com
AnsibleAWSAzureChefCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheus+4 more
Leidos

DevOps Technical Lead

Leidos
Seniorfull-time$105k–$189k / year🇺🇸 United States
Posted: 22 hours agoSource: leidos.wd5.myworkdayjobs.com
AWSCloudGrafanaJenkinsJMeterKafkaLinuxMavenSeleniumSplunkZookeeper