Tech Stack
AWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform
About the role
- As a Platform Engineer at Aurelian, you’ll help design and scale the infrastructure behind our life-saving voice AI systems. This includes building multi-region, highly available systems, supporting real-time voice workloads, and managing infrastructure for AI models — including containerized GPU workloads.
- You’ll play a critical role in ensuring our systems are reliable, fast, and scalable as we continue to expand nationwide.
- Scale infrastructure to support millions of voice interactions across U.S. call centers.
- Enhance our multi-region architecture to reduce latency and improve uptime.
- Design and maintain infrastructure for self-hosted, GPU-based AI workloads.
- Improve our monitoring, alerting, and observability to support mission-critical systems.
- Collaborate across teams to build deployment pipelines and support product delivery.
- Automate infrastructure using Terraform (or similar IaC tools) and manage Kubernetes clusters.
Requirements
- 3+ years of experience in DevOps, SRE, or Platform Engineering roles
- Proven experience running and managing Kubernetes clusters in production
- Experience deploying and operating containerized GPU workloads
- Experience with cloud providers (AWS, GCP, or Azure)
- Familiarity with observability tools (e.g., Prometheus, Grafana, Datadog)
- Solid scripting or programming skills — Python preferred
- Versatility and willingness to step outside your domain when needed (e.g., backend contributions)