Baseten

Forward Deployed SRE

Baseten

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $135,000 - $285,000 per year

About the role

  • Diagnose and resolve runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management.
  • Debug infrastructure issues across Kubernetes (pods, controllers), networking, observability, and alerting systems.
  • Lead incident response during outages or escalations, managing coordination between Product, FDE, Sales, and Engineering.
  • Serve as the technical owner for top enterprise accounts with strict SLAs and high responsiveness expectations.
  • Identify common failure modes and translate user feedback into roadmap signals, product improvements, our internal runbooks, knowledge bases, and diagnostic best practices.
  • Own project coordination end-to-end: scoping, execution, communication, and stakeholder alignment across technical and non-technical teams ranging from feature requests, new deployments, and operational debugging issues.

Requirements

  • Deep Kubernetes troubleshooting expertise, including advanced resource debugging, pod/runtime analysis, and log-based diagnostics using observability tooling such as Grafana, Loki, and Prometheus.
  • Strong infrastructure debugging ability across container orchestration, networking, and service dependencies, with hands-on experience supporting production-grade clusters.
  • Experience managing high-severity incidents with major customers, including SLAs, post-incident reviews, and clear communication throughout escalations.
  • Proven project management and organizational skills with an ownership mindset, able to manage multiple complex, multi-stakeholder initiatives in parallel — including issue resolution, root-cause analysis, and feature delivery.
  • Ability to translate recurring technical pain points into roadmap-level insights, documentation improvements, or product enhancements.
  • Strong communication skills and executive presence during high-visibility situations, ensuring technical clarity and customer confidence.
  • 3+ years of experience in a fast-paced, high-growth, or customer-facing engineering environment.
Benefits
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Kubernetesruntime analysisresource debugginglog-based diagnosticsincident managementroot-cause analysisproject managementobservability toolingcontainer orchestrationservice dependencies
Soft Skills
communication skillsorganizational skillsownership mindsetexecutive presenceincident responsestakeholder alignmenttechnical claritycustomer confidenceproblem-solvingmulti-stakeholder management