Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Bitdeer Group

Senior SRE Platform Architect

Bitdeer Group

SRE Platform Architect at Bitdeer leading the architecture evolution of its cloud platform for AI applications in the United States. Collaborating across teams to optimize deployments and deliver infrastructure solutions.

Posted 6/24/2026full-timeSan Jose • California, Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
CloudKubernetesRay

About the role

Key responsibilities & impact
  • Own the end-to-end architecture of the NeoCloud SRE platform.
  • Write and maintain the platform architecture document.
  • Review every framework-level change.
  • Set design invariants.
  • Run the plugin framework.
  • Decide tier placement.
  • Coordinate with cloud-service teams and tenants.
  • Coordinate with Security.
  • Pre-flight roadmap items.
  • Defend the design under review.

Requirements

What you’ll need
  • 10+ years of production SRE / platform-engineering / infra-architecture, including ≥ 3 years at architect level.
  • Hands-on with GPU / AI-compute infrastructure — NVIDIA GPU ops (DCGM, MIG, vGPU, NVLink/NVSwitch, XID semantics, NCCL), InfiniBand or RoCE fabrics (subnet manager, fabric partitioning, optical health), HPC storage (Lustre, NetApp/Pure/DDN/VAST, NVMe-oF).
  • Multi-region observability at scale — metrics / logs / traces / profiles / analytics-lake substrate; recording rules, MWMBR burn-rate alerting, SLI/SLO discipline.
  • Cluster platforms — first-hand experience with Kubernetes (control plane + GPU Operator + topology-aware scheduling) AND at least one of Slurm / Volcano / Kueue / Ray / KubeRay.
  • Data-center operations — ZTP, BMC/IPMI/Redfish, BIOS/firmware lifecycle, RMA, multi-vendor OEM management (self-built + leased DC mix).
  • Strong DDD instincts — bounded contexts, public contracts, no shared databases, one-context-one-repo discipline.
  • Plugin framework design — you have built (or substantively contributed to) a real extension framework with a uniform manifest + lifecycle.
  • Writing fluency — you can author and maintain a multi-thousand-line architecture document under review without it drifting; you can also write a one-pager an executive will read.
  • Cross-team operating tempo — design reviews, runbook authorship, on-call shadowing, post-mortem facilitation Hyperscale or NeoCloud experience.
  • BS/MS in Computer Science or similar.

Benefits

Comp & perks
  • Health insurance
  • Flexible work arrangements

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREplatform engineeringinfra architectureGPU infrastructureAI-computeKubernetesHPC storageplugin framework designobservabilitydata-center operations
Soft Skills
writing fluencycross-team collaborationdesign reviewon-call shadowingpost-mortem facilitation
Certifications
BS in Computer ScienceMS in Computer Science