Senior Platform Engineer

Sully.ai

full-time

Posted on: 9/20/2025

Origin: • 🇺🇸 United States

✨ AI Apply

Senior

AWSAzureCloudGoogle Cloud PlatformKubernetesPythonTerraformTypeScript

About the role

Own strategy, design, and execution of platform integrations connecting Sully.ai’s AI medical employees with major EHRs, partner APIs, and healthcare data standards.
Build and scale a high-performing platform team; collaborate closely with Product, Clinical, Security, and Customer Success.
Own multi‑cloud provisioning, upgrading, and reliability of the core platform (AWS, GCP, Azure).
Automate everything via infrastructure as code and GitOps (ArgoCD, Terraform/Pulumi).
Ship fully automated deployments across regions and standardize developer experience (tooling, CI/CD).
Optimize stack scalability and cost‑awareness for AI workloads; enforce observability, SLOs/SLIs, incident management, and MTTR improvements.
Ensure integrations are robust, secure, HIPAA compliant, and customer‑facing quality is high.
First‑month: define and deliver a platform solution, audit platform/CI/CD/dev tooling, baseline lead time/deploy freq/MTTR/cost per token, and draft a Platform Playbook.
90‑day success: 100% infra changes via IaC+GitOps, automated multi‑region deployments across ≥2 clouds, SLOs enforced, MTTR ↓50%, and significant engineer efficiency and cost reductions.

10+ years in Platform, DevOps, or SRE roles with production Kubernetes across multiple clouds (AWS, GCP, Azure), not single‑cloud only.
Demonstrated experience designing multi‑cloud architecture and portability patterns.
Proficiency with GitOps and Infrastructure as Code, including ArgoCD and either Terraform or Pulumi.
Strong coding skills in Python or TypeScript for platform tooling and automation.
Experience running CI and CD at scale; owning observability (metrics, logs, tracing); practicing SLOs and SLIs; managing incidents; improving MTTR.
Experience operating in regulated environments (HIPAA) with strong security, networking, and IAM fundamentals.
Nice-to-Have: AI/LLM production experience; OpenAI-compatible APIs.
Nice-to-Have: Cost optimization for AI/cloud.
Nice-to-Have: Security depth (policy‑as‑code, supply chain).