AgilityFeat

Senior SRE DevOps Engineer

AgilityFeat

full-time

Posted on:

Location Type: Remote

Location: VirginiaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $5,000 - $7,000 per month

Job Level

About the role

  • Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform
  • Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
  • Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR
  • Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency
  • Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks
  • Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration
  • Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments
  • Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response
  • Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services
  • Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic

Requirements

  • 7+ years in SRE/DevOps/Platform Engineering with a strong software development background
  • Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
  • Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
  • Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
  • Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
  • Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
  • Solid security background: network security, secrets management, compliance, incident response
  • Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
  • Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
  • Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation.
Benefits
  • A role where engineering and operations merge, you'll ship code that keeps the platform running
  • Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
  • Full ownership of the infrastructure stack with direct impact on reliability and scale
  • Competitive compensation, flexible remote work and a great work environment
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
TypeScriptPythonAWSTerraformCloudFormationCI/CDDockerKubernetesPostgreSQLRedis
Soft Skills
leadershipincident responsecommunicationdata-informed decision makingproblem solvingcollaborationautomationreliability engineering