LivePerson

Site Reliability Engineer II

LivePerson

full-time

Posted on:

Location Type: Remote

Location: Bulgaria

Visit company website

Explore more

AI Apply
Apply

About the role

  • Maintain and support existing products within the Echo ecosystem.
  • Ensure high availability, performance, and reliability of platform services.
  • Define, monitor, and improve SLOs, SLIs, and error budgets.
  • Proactively identify system risks and implement reliability improvements.
  • Participate in incident response, troubleshooting, and post-incident reviews.
  • Deploy, manage, and optimize workloads on Google Kubernetes Engine (GKE).
  • Manage cluster capacity, scaling strategies, and resource allocation.
  • Optimize CPU, memory, and storage utilization to improve performance and reduce cost.
  • Ensure cluster security, upgrades, and best practices are followed.
  • Troubleshoot networking, service mesh (if applicable), ingress, and service-to-service communication issues.
  • Implement and manage GitOps-based deployment workflows.
  • Ensure infrastructure and application changes are version-controlled and automated.
  • Work closely with developers to safely release code to production using CI/CD best practices.
  • Support progressive delivery techniques (e.g., canary, blue/green deployments).
  • Reduce deployment risk through automation and validation mechanisms.
  • Implement and enhance observability practices across services.
  • Build and maintain dashboards, alerts, and health metrics.
  • Implement and manage OpenTelemetry (OTEL) for tracing and metrics collection.
  • Ensure proactive alerting aligned with SLOs.
  • Drive improvements in monitoring coverage and signal quality.
  • Strong understanding of Kubernetes networking, services, ingress, load balancing, DNS, and service communication.
  • Diagnose latency, connectivity, and traffic routing issues.
  • Understand how distributed services interact across the ecosystem.

Requirements

  • 4–7 years of experience in SRE, DevOps, or Platform Engineering roles
  • Strong hands-on experience managing production workloads on GKE
  • Solid experience with GitOps practices (ArgoCD, Flux, or similar)
  • Strong understanding of Kubernetes networking and cloud networking fundamentals
  • Experience optimizing resource allocation and scaling in Kubernetes
  • Experience implementing observability solutions using OpenTelemetry (OTEL)
  • Experience defining and operating with SLOs and SLIs
  • Hands-on experience with CI/CD pipelines and automated deployments
  • Strong troubleshooting and incident management experience
Benefits
  • Health: medical, dental, and vision
  • Time away: vacation and holidays
  • Development: Generous tuition reimbursement and access to internal professional development resources
  • Equal opportunity employer
  • #LI-Remote
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Google Kubernetes Engine (GKE)GitOpsOpenTelemetry (OTEL)CI/CDSLOsSLIsnetworkingresource allocationscalingobservability
Soft Skills
troubleshootingincident managementproactive identification of riskscollaboration with developersmonitoring improvements