Aldea

Foundational AI Researcher

Aldea

full-time

Posted on:

Origin:  • 🇺🇸 United States • Florida

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDNSDockerElasticSearchFirewallsGrafanaKubernetesLinuxPostgresPrometheusPythonRedisTerraformVault

About the role

  • Multi-Environment Kubernetes Architecture - Manage 5 distinct environments (NMS, Sandbox, Development, Staging, Production) with different security requirements and design redundancy/failover mechanisms
  • Infrastructure as Code Excellence - Develop and maintain Pulumi-based infrastructure using Python, managing complex cross-environment dependencies and VPC peering relationships
  • Zero-Trust Security Implementation - Implement certificate-based VPN access with internal DNS resolution, configure WAF/security groups, and manage HashiCorp Vault integration
  • Comprehensive Observability - Deploy and configure Prometheus, Grafana, Loki, Jaeger, and CloudWatch with unified monitoring across distributed infrastructure
  • API Platform Management - Deploy and maintain centralized API managing all environments from NMS hub, implementing automation for training jobs and inference optimization

Requirements

  • Must Have Qualifications:
  • 5+ years in DevOps, SRE, or infrastructure engineering
  • Expert-level Kubernetes experience with EKS and multi-cluster management
  • Strong Python programming skills for infrastructure automation and API development
  • Infrastructure as Code expertise with Pulumi, Terraform, or similar tools
  • Deep AWS knowledge: VPC, EKS, ECR, S3, CloudWatch, IAM, and networking
  • Linux system administration and containerization with Docker
  • Hands-on experience with Prometheus, Grafana, and centralized logging systems
  • Network security experience including VPN, firewalls, and certificate management
  • Nice to Have Qualifications:
  • Machine Learning infrastructure experience (GPU clusters, model serving, ML pipelines)
  • HashiCorp Vault administration and integration
  • GitOps experience with ArgoCD or similar tools
  • Service mesh experience (Istio, Linkerd)
  • Database administration (PostgreSQL, Redis, Elasticsearch)
  • CI/CD pipeline design and multi-cloud infrastructure experience