Lambda

Senior Site Reliability Engineer – Named Accounts

Lambda

full-time

Posted on:

Location Type: Hybrid

Location: Seattle • Washington • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $240,000 - $425,000 per year

Job Level

Senior

Tech Stack

Distributed SystemsGoGrafanaKubernetesLinuxPrometheusPython

About the role

  • Embed on-site with a named strategic customer, becoming an extension of their team
  • Act as the primary technical liaison between Lambda and the customer organization
  • Navigate ambiguous requirements to identify root problems and define clear technical solutions
  • Drive alignment across internal Lambda teams and customer stakeholders
  • Scope, sequence, and build full-stack solutions that deliver measurable business value
  • Design and implement infrastructure optimizations for AI/ML workloads at scale
  • Debug complex distributed systems issues across the infrastructure stack
  • Ship iteratively and learn fast, adjusting approach based on customer feedback and results
  • Identify reusable patterns from customer engagements that can scale across Lambda's customer base
  • Surface field intelligence that influences Lambda's product roadmap
  • Document and share learnings to elevate the capabilities of the broader team
  • Represent Lambda with executive presence in high-stakes customer interactions

Requirements

  • 6+ years of experience in a SRE, software engineer, or similar role, with a deep knowledge of running Linux clusters and systems
  • Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators
  • Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar)
  • Hands-on experience with AI/ML workload management tools (Volcano, Kubeflow, or similar)
  • Can work either independently with limited direction or as part of a team
  • Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines
  • Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar
  • Excellent communication skills with the ability to translate technical complexity for diverse audiences
  • Executive presence and ability to represent Lambda in customer-facing situations
  • Comfort operating in ambiguous environments with competing priorities
  • Strong bias for action and shipping iteratively
Benefits
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and Commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible Paid Time Off Plan that we all actually use

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
LinuxGoPythonGitOpsKubernetesAI/ML workload managementKubernetes provisioningDebuggingInfrastructure optimizationFull-stack solutions
Soft skills
CommunicationExecutive presenceProblem-solvingTeam collaborationAdaptabilityCustomer engagementDocumentationLearning agilityInfluencingBias for action
Boeing

Senior Site Reliability Engineer

Boeing
Seniorfull-time$109k–$209k / yearColorado, Washington · 🇺🇸 United States
Posted: 11 days agoSource: boeing.wd1.myworkdayjobs.com
AnsibleAWSAzureChefCloudDockerGoogle Cloud PlatformGrafanaITSMJavaKubernetesLinux+3 more
GEICO

Senior Staff Engineer, Software Engineering – CICD, DevOps, Change Management

GEICO
Seniorfull-time$130k–$260k / yearCalifornia, Maryland, Washington · 🇺🇸 United States
Posted: 13 days agoSource: geico.wd1.myworkdayjobs.com
AWSAzureCloudDockerGoogle Cloud PlatformKubernetesNoSQLPythonSQL
Boeing

Cloud DevOps Engineer

Boeing
Junior · Midfull-time$85k–$123k / yearCalifornia, Florida, Texas, Washington · 🇺🇸 United States
Posted: 15 days agoSource: boeing.wd1.myworkdayjobs.com
AWSCloudDockerGrafanaPrometheusPythonTerraformVault
Boeing

Cloud DevOps Engineer

Boeing
Junior · Midfull-time$85k–$123k / yearCalifornia, Florida, Texas, Washington · 🇺🇸 United States
Posted: 16 days agoSource: boeing.wd1.myworkdayjobs.com
AWSCloudDockerGrafanaPrometheusPythonTerraformVault