
Senior Infrastructure Engineer
Prometheum
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteSalary
💰 $140,000 - $180,000 per year
Job Level
Senior
Tech Stack
AWSCloudDNSDockerGoGrafanaKubernetesLinuxPrometheusPythonRustTerraformTypeScript
About the role
- Design, build, and maintain AWS cloud infrastructure using Terraform, Terragrunt, Helm, ArgoCD, Kubernetes (EKS), and CI/CD pipelines (GitHub Actions)
- Manage infrastructure across multiple AWS accounts and environments, ensuring consistency, proper isolation, and security
- Maintain and optimize Kubernetes clusters, including EKS upgrades, component updates, and capacity planning
- Build and maintain observability systems (Prometheus, Grafana, Datadog) with comprehensive alerting and dashboards
- Manage dependency updates and security patches across Docker images, Helm charts, Terraform modules, and application dependencies using automation tools like Renovate
- Enhance security posture through least privilege access, signed images, admission controllers (Kyverno), and mTLS
- Participate in on-call rotation to respond to incidents, troubleshoot issues, identify root causes, and implement preventive measures
- Document infrastructure patterns, best practices, and operational procedures
Requirements
- 5+ years of experience architecting, designing, and implementing cloud solutions on AWS
- Production experience with Docker and Kubernetes (AWS EKS strongly preferred)
- Strong Infrastructure-as-Code skills using Terraform and Terragrunt (or similar DRY configuration patterns)
- Experience managing infrastructure across multiple AWS accounts with IAM, SSO, and account isolation
- Hands-on experience with GitOps workflows and tools (ArgoCD preferred)
- Experience with CI/CD pipelines and automation (GitHub Actions preferred)
- Experience with observability tools (Prometheus, Grafana, Datadog) for metrics, alerting, and dashboards
- Experience with Cloudflare and Cloudflare Zero Trust for network security, DNS, and secure access
- Proficiency in at least one programming language: Python, Go, Rust, or TypeScript
- Strong troubleshooting skills in containerized Linux environments
- Experience applying SRE principles: SLO/SLIs, golden signals, MTTR, progressive rollouts, and change management
- Experience setting up, managing, and maintaining high-availability blockchain infrastructure for production environments (Nice to have)
- Experience working in a highly regulated environment (Nice to have)
- Experience building and operating multi-region, multi-cloud production systems (Nice to have)
- Experience using AI-related tools in DevOps and infrastructure toolchains (Nice to have)
Benefits
- Competitive salary based on experience
- Excellent benefits including:
- Health, Vision & Dental Insurance
- Fully remote position with equipment provided.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSTerraformTerragruntKubernetesDockerGitOpsCI/CDPythonGoRust
Soft skills
troubleshootingincident responseroot cause analysisdocumentation