
Site Reliability Engineer – SRE
Baseten
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $165,000 - $330,000 per year
About the role
- Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
- Establish standards and best practices for reliability and performance across the infrastructure.
- Automate processes when relevant, particularly for managing CI/CD pipelines.
- Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution.
- Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.
- Mentor junior team members and contribute to knowledge sharing within the organization.
- Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
- Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.
Requirements
- Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
- Extensive experience with Kubernetes.
- Experience in building and maintaining scalable infrastructure.
- Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins).
- Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
- Ability to own projects end-to-end, from project specification to execution.
- No prior machine learning experience required, but should be open to learning about it.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Kubernetesinfrastructure-as-codeTerraformCloudFormationPulumiCI/CDGitHub ActionsGitLab CICircle CIJenkins
Soft Skills
user empathyproject managementcollaborationmentoringknowledge sharingproblem solvingjudgmentaccountabilityownershipnavigating ambiguity