
Senior Software Engineer, Compute Platform
aion
full-time
Posted on:
Location Type: Hybrid
Location: Bengaluru • 🇮🇳 India
Visit company websiteJob Level
Senior
Tech Stack
AWSAzureCloudDistributed SystemsEC2GoGoogle Cloud PlatformGrafanaKafkaKubernetesPostgresPrometheusPythonRabbitMQRedisRustTerraform
About the role
- Design and architect AION's multi-cloud compute platform, building abstraction layers that unify diverse cloud providers (AWS, GCP, Azure, bare-metal data centers)
- Work directly with cloud providers to expand AION's compute pool—understanding pricing, availability zones, GPU types, and capacity planning
- Build and maintain the AION managed services
- Understand and abstract cloud provider differences in storage (block, object, file systems), networking (VPCs, subnets, security groups), and compute resources
- Design composable platform components that enable forward deployments and promote reusability across AION's infrastructure stack
- Own end-to-end development of managed services on the compute platform—from design and architecture through execution and production monitoring
- Build scalable orchestration systems for GPU workloads, container scheduling, and resource allocation
- Develop robust APIs and control planes for compute lifecycle management (provisioning, scaling, termination)
- Lead technical discussions on platform reliability, performance optimization, and cost efficiency
- Execute on peripheral platform services including billing systems, usage accounting, observability infrastructure, and compliance tooling
- Build monitoring and telemetry systems for compute utilization, cost tracking, and performance metrics
- Establish engineering standards for platform development including code reviews, quality gates, and testing practices
- Mentor engineers on infrastructure best practices and distributed systems design
Requirements
- 4+ years of experience building and scaling complex backend systems, cloud infrastructure, or distributed platforms
- Strong understanding of multi-cloud architectures and experience working with AWS, GCP, or Azure at scale
- Deep knowledge of cloud abstractions: compute (EC2, GCE, VMs), storage (S3, GCS, EBS), networking (VPCs, load balancers, security groups)
- Proficiency in Golang strongly preferred; Python, Rust, or other systems languages a plus
- Experience with Kubernetes, container orchestration, and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
- Solid understanding of distributed systems principles, consensus algorithms, and state management
- Experience building APIs, control planes, and platform services for infrastructure management
- Familiarity with databases (PostgreSQL, Redis, etcd), message queues (Kafka, RabbitMQ), and event-driven architectures
- Knowledge of GPU orchestration, AI/ML workloads, or HPC systems is highly desirable
- Experience with observability tools (Prometheus, Grafana, Datadog) and distributed tracing
- Understanding of cloud billing models, cost optimization strategies, and resource scheduling
Benefits
- **Preferred Attributes:**
- - High ownership, self driven and bias for action.
- - Strong strategic thinking and ability to connect technical decisions to business impact.
- - Excellent communication and mentoring skills.
- - Thrives in ambiguity, fast-paced environments, and early-stage startup culture.
- **Why Join AION?**
- - Work directly with high-pedigree founders shaping technical and product strategy.
- - Build infrastructure powering the future of AI compute globally.
- - Significant ownership and impact with equity reflective of your contributions.
- - Competitive compensation, flexible work options, and wellness benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
GolangPythonRustKubernetesTerraformPulumiCloudFormationAPIsdistributed systemsGPU orchestration
Soft skills
mentoringtechnical discussionsperformance optimizationcost efficiencyengineering standards