Crusoe

Senior Software Engineer, Managed Orchestration, Kubernetes

Crusoe

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $166,000 - $204,000 per year

Job Level

Senior

Tech Stack

AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformKubernetesLinuxNode.jsPythonRustTerraform

About the role

  • Architect, build, and operate features for Crusoe’s Managed Kubernetes platform (control plane, autoscaling, cluster lifecycle, upgrades, multi-tenancy).
  • Integrate and optimize GPU workloads within Kubernetes clusters, including device plugins, GPU operators, scheduling, and monitoring.
  • Enhance container networking through advanced CNI integration (Cilium, Calico, Multus) and support for high-performance networking (InfiniBand, RoCE).
  • Improve reliability and resilience of Kubernetes clusters, including HA control planes, node lifecycle management, and self-healing mechanisms.
  • Contribute to open-source and internal tooling that enhances observability, automation, and cluster security.
  • Participate in design reviews, provide mentorship to engineers, and help set long-term technical direction.
  • Troubleshoot complex distributed systems problems spanning containers, GPUs, and networking.

Requirements

  • 5+ years of software engineering experience in distributed systems, cloud, or infrastructure.
  • Deep understanding of Kubernetes internals (control plane, scheduling, operators, controllers, API machinery).
  • Strong proficiency in Go (preferred) or similar languages (Rust, C++, Python for systems work).
  • Experience with container networking (CNI plugins, service mesh, load balancing) and Linux networking fundamentals.
  • Exposure to GPU workloads in Kubernetes (device plugins, GPU operators, scheduling, autoscaling).
  • Familiarity with cloud platforms (AWS, GCP, or Azure) and infrastructure automation (Terraform, Helm, GitOps).
  • Strong debugging and performance optimization skills for distributed systems.
  • Passion for building reliable, developer-friendly platforms that abstract complexity for customers.
  • Familiarity with NVIDIA and AMD GPUs, device plugins, and operators for GPU lifecycle management.
  • Knowledge of network operators and CNI implementations (Cilium, Calico, Multus).
  • Experience with high-performance networking technologies (InfiniBand, RoCE).
  • Contributions to Kubernetes SIGs, CNCF projects, or related open-source communities.
  • Experience with Slurm, MPI, or HPC-style job schedulers.
  • Familiarity with service meshes (Istio, Linkerd) and multi-cluster networking.
  • Background in security for containers, GPUs, and Kubernetes (PodSecurity, RBAC, runtime scanning).