
Senior Infrastructure Engineer – DevOps / Production / AWS
25madison
full-time
Posted on:
Location Type: Remote
Location: New York • United States
Visit company websiteExplore more
Salary
💰 $145,000 - $190,000 per year
Job Level
About the role
- Own uptime (99.9%+), observability, incident response, and root cause analysis
- Deep AWS stack: EC2 (including GPU), ECS/Fargate, SQS, Lambda, S3, CloudFront, API Gateway, RDS/DynamoDB — plus VPC design, IAM, autoscaling, and monitoring
- Build the plumbing: retry logic, idempotency, checkpointing, parallel orchestration
- Chase down performance problems: Queue bottlenecks, cold starts, LLM latency, runaway costs
- Help the team ship faster: CI/CD, infrastructure-as-code (Terraform/CDK/Pulumi), clean containerization, and proper staging environments
Requirements
- 8+ years in infrastructure / DevOps / production engineering
- Deep AWS expertise (not just “used it” — architected at scale)
- Experience running production ML or AI systems
- Experience with asynchronous distributed systems
- Strong knowledge of: ECS / Fargate, EC2 (including GPU instances), SQS, S3, VPC networking, and IAM best practices
- Strong understanding of: Containerization (Docker), CI/CD pipelines, Infrastructure as Code and observability systems
- Experience debugging production incidents and designing fault-tolerant systems
Benefits
- Competitive comp
- Meaningful equity
- Genuine shot at defining how AI agents operate in production
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSEC2ECSFargateSQSLambdaS3CloudFrontRDSDynamoDB
Soft Skills
incident responseroot cause analysisperformance optimizationteam collaborationproblem-solving