Worth AI

Senior DevOps Engineer, Infrastructure – Reliability

Worth AI

full-time

Posted on:

Location Type: Hybrid

Location: OrlandoFloridaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Conduct regular interviews with engineering teams to identify operational pain points in CI/CD, deployments, observability, and cloud environments and proactively eliminate them.
  • Design and implement scalable Infrastructure-as-Code patterns using tools like Terraform to standardize cloud provisioning and reduce configuration drift.
  • Own and evolve our Kubernetes platform (EKS or self-managed), ensuring workloads are secure, scalable, and resilient by default.
  • Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase confidence in releases.
  • Lead systemic reliability initiatives, including incident response improvements, root cause analysis practices, and postmortem frameworks.
  • Design and enforce secure networking, IAM, and secrets management strategies across environments.
  • Improve observability by refining metrics, logs, and tracing using tools like DataDog, ensuring actionable insight into system health.
  • Optimize cloud cost efficiency through rightsizing, autoscaling strategies, and architectural improvements.
  • Own disaster recovery planning, backup strategies, and multi-region resilience initiatives.
  • Refactor brittle or manually managed infrastructure into automated, testable, and reproducible systems.
  • Introduce new infrastructure tooling or architectural shifts and drive adoption through documentation, workshops, and hands-on support.
  • Lead by example in incident management, risk mitigation, and operational excellence.
  • Communicate technical trade-offs clearly across engineering and product stakeholders, balancing speed with safety.

Requirements

  • 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Proven experience designing and operating production Kubernetes environments at scale.
  • Deep hands-on expertise with AWS infrastructure and cloud networking.
  • Strong experience building and maintaining Terraform modules across large cloud environments.
  • Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics.
  • Experience leading incident response processes and driving meaningful postmortem outcomes.
  • Strong understanding of distributed systems, event-driven architectures (Kafka), and database performance (PostgreSQL).
  • Proven ability to modernize legacy infrastructure and eliminate manual operational toil.
  • Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps.
  • Demonstrated ability to build trust across teams while raising the reliability bar.
Benefits
  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance
  • Flexible Vacation
  • Work From Home
  • Free Food & Snacks (in office)
  • Orlando, Florida (Hybrid)
  • Wellness Resources
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Infrastructure-as-CodeTerraformKubernetesCI/CDAWScloud networkingDORA metricsdistributed systemsevent-driven architecturesPostgreSQL
Soft Skills
incident managementrisk mitigationoperational excellencecommunicationleadershiptrust buildingproblem solvingcollaborationprioritizationadaptability