
Senior SRE DevOps Engineer
AgilityFeat
full-time
Posted on:
Location Type: Remote
Location: Virginia • United States
Visit company websiteExplore more
Salary
💰 $5,000 - $7,000 per month
Job Level
Tech Stack
About the role
- Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform
- Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
- Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR
- Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency
- Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks
- Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration
- Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments
- Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response
- Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services
- Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic
Requirements
- 7+ years in SRE/DevOps/Platform Engineering with a strong software development background
- Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
- Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
- Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
- Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
- Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
- Solid security background: network security, secrets management, compliance, incident response
- Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
- Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
- Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation.
Benefits
- A role where engineering and operations merge, you'll ship code that keeps the platform running
- Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
- Full ownership of the infrastructure stack with direct impact on reliability and scale
- Competitive compensation, flexible remote work and a great work environment
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
TypeScriptPythonAWSTerraformCloudFormationCI/CDDockerKubernetesPostgreSQLRedis
Soft Skills
leadershipincident responsecommunicationdata-informed decision makingproblem solvingcollaborationautomationreliability engineering