25madison

Senior Infrastructure Engineer – DevOps / Production / AWS

25madison

full-time

Posted on:

Location Type: Remote

Location: New YorkUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $145,000 - $190,000 per year

Job Level

About the role

  • Own uptime (99.9%+), observability, incident response, and root cause analysis
  • Deep AWS stack: EC2 (including GPU), ECS/Fargate, SQS, Lambda, S3, CloudFront, API Gateway, RDS/DynamoDB — plus VPC design, IAM, autoscaling, and monitoring
  • Build the plumbing: retry logic, idempotency, checkpointing, parallel orchestration
  • Chase down performance problems: Queue bottlenecks, cold starts, LLM latency, runaway costs
  • Help the team ship faster: CI/CD, infrastructure-as-code (Terraform/CDK/Pulumi), clean containerization, and proper staging environments

Requirements

  • 8+ years in infrastructure / DevOps / production engineering
  • Deep AWS expertise (not just “used it” — architected at scale)
  • Experience running production ML or AI systems
  • Experience with asynchronous distributed systems
  • Strong knowledge of: ECS / Fargate, EC2 (including GPU instances), SQS, S3, VPC networking, and IAM best practices
  • Strong understanding of: Containerization (Docker), CI/CD pipelines, Infrastructure as Code and observability systems
  • Experience debugging production incidents and designing fault-tolerant systems
Benefits
  • Competitive comp
  • Meaningful equity
  • Genuine shot at defining how AI agents operate in production
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSEC2ECSFargateSQSLambdaS3CloudFrontRDSDynamoDB
Soft Skills
incident responseroot cause analysisperformance optimizationteam collaborationproblem-solving