Prometheum

Senior Infrastructure Engineer

Prometheum

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $140,000 - $180,000 per year

Job Level

Senior

Tech Stack

AWSCloudDNSDockerGoGrafanaKubernetesLinuxPrometheusPythonRustTerraformTypeScript

About the role

  • Design, build, and maintain AWS cloud infrastructure using Terraform, Terragrunt, Helm, ArgoCD, Kubernetes (EKS), and CI/CD pipelines (GitHub Actions)
  • Manage infrastructure across multiple AWS accounts and environments, ensuring consistency, proper isolation, and security
  • Maintain and optimize Kubernetes clusters, including EKS upgrades, component updates, and capacity planning
  • Build and maintain observability systems (Prometheus, Grafana, Datadog) with comprehensive alerting and dashboards
  • Manage dependency updates and security patches across Docker images, Helm charts, Terraform modules, and application dependencies using automation tools like Renovate
  • Enhance security posture through least privilege access, signed images, admission controllers (Kyverno), and mTLS
  • Participate in on-call rotation to respond to incidents, troubleshoot issues, identify root causes, and implement preventive measures
  • Document infrastructure patterns, best practices, and operational procedures

Requirements

  • 5+ years of experience architecting, designing, and implementing cloud solutions on AWS
  • Production experience with Docker and Kubernetes (AWS EKS strongly preferred)
  • Strong Infrastructure-as-Code skills using Terraform and Terragrunt (or similar DRY configuration patterns)
  • Experience managing infrastructure across multiple AWS accounts with IAM, SSO, and account isolation
  • Hands-on experience with GitOps workflows and tools (ArgoCD preferred)
  • Experience with CI/CD pipelines and automation (GitHub Actions preferred)
  • Experience with observability tools (Prometheus, Grafana, Datadog) for metrics, alerting, and dashboards
  • Experience with Cloudflare and Cloudflare Zero Trust for network security, DNS, and secure access
  • Proficiency in at least one programming language: Python, Go, Rust, or TypeScript
  • Strong troubleshooting skills in containerized Linux environments
  • Experience applying SRE principles: SLO/SLIs, golden signals, MTTR, progressive rollouts, and change management
  • Experience setting up, managing, and maintaining high-availability blockchain infrastructure for production environments (Nice to have)
  • Experience working in a highly regulated environment (Nice to have)
  • Experience building and operating multi-region, multi-cloud production systems (Nice to have)
  • Experience using AI-related tools in DevOps and infrastructure toolchains (Nice to have)
Benefits
  • Competitive salary based on experience
  • Excellent benefits including:
  • Health, Vision & Dental Insurance
  • Fully remote position with equipment provided.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSTerraformTerragruntKubernetesDockerGitOpsCI/CDPythonGoRust
Soft skills
troubleshootingincident responseroot cause analysisdocumentation