
Senior DevOps Engineer, Infrastructure – Reliability
Worth AI
full-time
Posted on:
Location Type: Hybrid
Location: Orlando • Florida • United States
Visit company websiteExplore more
Job Level
About the role
- Conduct regular interviews with engineering teams to identify operational pain points in CI/CD, deployments, observability, and cloud environments and proactively eliminate them.
- Design and implement scalable Infrastructure-as-Code patterns using tools like Terraform to standardize cloud provisioning and reduce configuration drift.
- Own and evolve our Kubernetes platform (EKS or self-managed), ensuring workloads are secure, scalable, and resilient by default.
- Architect and optimize CI/CD pipelines to improve deployment frequency, reduce lead time, and increase confidence in releases.
- Lead systemic reliability initiatives, including incident response improvements, root cause analysis practices, and postmortem frameworks.
- Design and enforce secure networking, IAM, and secrets management strategies across environments.
- Improve observability by refining metrics, logs, and tracing using tools like DataDog, ensuring actionable insight into system health.
- Optimize cloud cost efficiency through rightsizing, autoscaling strategies, and architectural improvements.
- Own disaster recovery planning, backup strategies, and multi-region resilience initiatives.
- Refactor brittle or manually managed infrastructure into automated, testable, and reproducible systems.
- Introduce new infrastructure tooling or architectural shifts and drive adoption through documentation, workshops, and hands-on support.
- Lead by example in incident management, risk mitigation, and operational excellence.
- Communicate technical trade-offs clearly across engineering and product stakeholders, balancing speed with safety.
Requirements
- 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
- Proven experience designing and operating production Kubernetes environments at scale.
- Deep hands-on expertise with AWS infrastructure and cloud networking.
- Strong experience building and maintaining Terraform modules across large cloud environments.
- Demonstrated ownership of CI/CD systems and measurable improvement of DORA metrics.
- Experience leading incident response processes and driving meaningful postmortem outcomes.
- Strong understanding of distributed systems, event-driven architectures (Kafka), and database performance (PostgreSQL).
- Proven ability to modernize legacy infrastructure and eliminate manual operational toil.
- Experience navigating high-ambiguity environments and translating operational friction into prioritized infrastructure roadmaps.
- Demonstrated ability to build trust across teams while raising the reliability bar.
Benefits
- Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k, IRA)
- Life Insurance
- Flexible Vacation
- Work From Home
- Free Food & Snacks (in office)
- Orlando, Florida (Hybrid)
- Wellness Resources
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Infrastructure-as-CodeTerraformKubernetesCI/CDAWScloud networkingDORA metricsdistributed systemsevent-driven architecturesPostgreSQL
Soft Skills
incident managementrisk mitigationoperational excellencecommunicationleadershiptrust buildingproblem solvingcollaborationprioritizationadaptability