
Forward Deployed SRE
Baseten
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $135,000 - $285,000 per year
Tech Stack
About the role
- Diagnose and resolve runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management.
- Debug infrastructure issues across Kubernetes (pods, controllers), networking, observability, and alerting systems.
- Lead incident response during outages or escalations, managing coordination between Product, FDE, Sales, and Engineering.
- Serve as the technical owner for top enterprise accounts with strict SLAs and high responsiveness expectations.
- Identify common failure modes and translate user feedback into roadmap signals, product improvements, our internal runbooks, knowledge bases, and diagnostic best practices.
- Own project coordination end-to-end: scoping, execution, communication, and stakeholder alignment across technical and non-technical teams ranging from feature requests, new deployments, and operational debugging issues.
Requirements
- Deep Kubernetes troubleshooting expertise, including advanced resource debugging, pod/runtime analysis, and log-based diagnostics using observability tooling such as Grafana, Loki, and Prometheus.
- Strong infrastructure debugging ability across container orchestration, networking, and service dependencies, with hands-on experience supporting production-grade clusters.
- Experience managing high-severity incidents with major customers, including SLAs, post-incident reviews, and clear communication throughout escalations.
- Proven project management and organizational skills with an ownership mindset, able to manage multiple complex, multi-stakeholder initiatives in parallel — including issue resolution, root-cause analysis, and feature delivery.
- Ability to translate recurring technical pain points into roadmap-level insights, documentation improvements, or product enhancements.
- Strong communication skills and executive presence during high-visibility situations, ensuring technical clarity and customer confidence.
- 3+ years of experience in a fast-paced, high-growth, or customer-facing engineering environment.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Kubernetesruntime analysisresource debugginglog-based diagnosticsincident managementroot-cause analysisproject managementobservability toolingcontainer orchestrationservice dependencies
Soft Skills
communication skillsorganizational skillsownership mindsetexecutive presenceincident responsestakeholder alignmenttechnical claritycustomer confidenceproblem-solvingmulti-stakeholder management