Decision Foundry

Principal Software Engineer, AI Cloud

Decision Foundry

full-time

Posted on:

Location Type: Remote

Location: WashingtonUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $232,000 - $319,000 per year

Job Level

About the role

  • Define and drive the long-term technical strategy for AI Cloud’s control and data plane services.
  • Architect highly available, multi-region systems capable of operating seamlessly across multiple cloud providers.
  • Design APIs and service abstractions that integrate Desktop, Hub, and enterprise cloud services.
  • Establish standards for reliability, scalability, and observability across the AI Cloud platform.
  • Lead cross-functional technical discussions and influence architectural decisions company-wide.
  • Design and implement distributed systems for workload orchestration, service discovery, and lifecycle management.
  • Build and operate control plane components that manage multi-tenant workloads and cloud networking.
  • Develop infrastructure that delivers predictable performance, intelligent scaling, and automated failover.
  • Ensure security, data integrity, and compliance across global infrastructure footprint.
  • Partner with platform and product teams to deliver developer-friendly APIs and cloud experiences.
  • Align technical direction with business objectives for cloud growth and developer platform unification.
  • Evaluate emerging technologies (e.g., service meshes, container orchestration, edge computing) and guide adoption.
  • Drive initiatives that reduce latency, optimize cost, and improve cross-cloud performance.
  • Define metrics and SLAs for AI Cloud’s reliability and scalability.
  • Mentor senior, staff and principal engineers, fostering technical excellence and growth across teams.
  • Lead design reviews and guide critical production system decisions.
  • Drive a culture of operational excellence, ownership, and innovation.
  • Collaborate with engineering and product leadership to align priorities and resource planning.
  • Take part in on-call rotation for your team; respond to incidents, debug production issues, and drive continuous improvement of system reliability.

Requirements

  • 10+ years of software engineering experience, including 3+ years in technical leadership roles (Staff or Principal level)
  • Proven experience designing and building highly scalable distributed systems in production environments
  • Deep understanding of cloud infrastructure (AWS, Azure, GCP, or OCI), including compute, networking, and storage primitives
  • Proficiency in Go, Rust, or Java
  • Expertise in Kubernetes, microservices, and service mesh architectures
  • Strong foundation in observability, CI/CD, and infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
  • Experience operating high-availability (99.99%+) production systems
  • Exceptional communication skills and ability to influence across technical and business domains
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Experience designing multi-cloud or cross-cloud abstractions and orchestration layers
  • Knowledge of container lifecycle management, networking, and policy enforcement
  • Prior experience in developer infrastructure, PaaS, or hyperscale SaaS environments
  • Background contributing to open source or developer-focused platforms is a plus.
Benefits
  • Work Model – Remote
  • Employment Type - Full-time
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GoRustJavaKubernetesmicroservicesservice meshTerraformPulumiCloudFormationdistributed systems
Soft Skills
communicationinfluencementorshipoperational excellenceownershipinnovation