
Principal Software Engineer, AI Cloud
Decision Foundry
full-time
Posted on:
Location Type: Remote
Location: Washington • United States
Visit company websiteExplore more
Salary
💰 $232,000 - $319,000 per year
Job Level
Tech Stack
About the role
- Define and drive the long-term technical strategy for AI Cloud’s control and data plane services.
- Architect highly available, multi-region systems capable of operating seamlessly across multiple cloud providers.
- Design APIs and service abstractions that integrate Desktop, Hub, and enterprise cloud services.
- Establish standards for reliability, scalability, and observability across the AI Cloud platform.
- Lead cross-functional technical discussions and influence architectural decisions company-wide.
- Design and implement distributed systems for workload orchestration, service discovery, and lifecycle management.
- Build and operate control plane components that manage multi-tenant workloads and cloud networking.
- Develop infrastructure that delivers predictable performance, intelligent scaling, and automated failover.
- Ensure security, data integrity, and compliance across global infrastructure footprint.
- Partner with platform and product teams to deliver developer-friendly APIs and cloud experiences.
- Align technical direction with business objectives for cloud growth and developer platform unification.
- Evaluate emerging technologies (e.g., service meshes, container orchestration, edge computing) and guide adoption.
- Drive initiatives that reduce latency, optimize cost, and improve cross-cloud performance.
- Define metrics and SLAs for AI Cloud’s reliability and scalability.
- Mentor senior, staff and principal engineers, fostering technical excellence and growth across teams.
- Lead design reviews and guide critical production system decisions.
- Drive a culture of operational excellence, ownership, and innovation.
- Collaborate with engineering and product leadership to align priorities and resource planning.
- Take part in on-call rotation for your team; respond to incidents, debug production issues, and drive continuous improvement of system reliability.
Requirements
- 10+ years of software engineering experience, including 3+ years in technical leadership roles (Staff or Principal level)
- Proven experience designing and building highly scalable distributed systems in production environments
- Deep understanding of cloud infrastructure (AWS, Azure, GCP, or OCI), including compute, networking, and storage primitives
- Proficiency in Go, Rust, or Java
- Expertise in Kubernetes, microservices, and service mesh architectures
- Strong foundation in observability, CI/CD, and infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
- Experience operating high-availability (99.99%+) production systems
- Exceptional communication skills and ability to influence across technical and business domains
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Experience designing multi-cloud or cross-cloud abstractions and orchestration layers
- Knowledge of container lifecycle management, networking, and policy enforcement
- Prior experience in developer infrastructure, PaaS, or hyperscale SaaS environments
- Background contributing to open source or developer-focused platforms is a plus.
Benefits
- Work Model – Remote
- Employment Type - Full-time
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GoRustJavaKubernetesmicroservicesservice meshTerraformPulumiCloudFormationdistributed systems
Soft Skills
communicationinfluencementorshipoperational excellenceownershipinnovation