Site Reliability Engineer II

LivePerson

full-time

Posted on: 2/25/2026

Location Type: Remote

Location: Bulgaria

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Cloud DNS Flux Kubernetes

About the role

Maintain and support existing products within the Echo ecosystem.
Ensure high availability, performance, and reliability of platform services.
Define, monitor, and improve SLOs, SLIs, and error budgets.
Proactively identify system risks and implement reliability improvements.
Participate in incident response, troubleshooting, and post-incident reviews.
Deploy, manage, and optimize workloads on Google Kubernetes Engine (GKE).
Manage cluster capacity, scaling strategies, and resource allocation.
Optimize CPU, memory, and storage utilization to improve performance and reduce cost.
Ensure cluster security, upgrades, and best practices are followed.
Troubleshoot networking, service mesh (if applicable), ingress, and service-to-service communication issues.
Implement and manage GitOps-based deployment workflows.
Ensure infrastructure and application changes are version-controlled and automated.
Work closely with developers to safely release code to production using CI/CD best practices.
Support progressive delivery techniques (e.g., canary, blue/green deployments).
Reduce deployment risk through automation and validation mechanisms.
Implement and enhance observability practices across services.
Build and maintain dashboards, alerts, and health metrics.
Implement and manage OpenTelemetry (OTEL) for tracing and metrics collection.
Ensure proactive alerting aligned with SLOs.
Drive improvements in monitoring coverage and signal quality.
Strong understanding of Kubernetes networking, services, ingress, load balancing, DNS, and service communication.
Diagnose latency, connectivity, and traffic routing issues.
Understand how distributed services interact across the ecosystem.

Requirements

4–7 years of experience in SRE, DevOps, or Platform Engineering roles
Strong hands-on experience managing production workloads on GKE
Solid experience with GitOps practices (ArgoCD, Flux, or similar)
Strong understanding of Kubernetes networking and cloud networking fundamentals
Experience optimizing resource allocation and scaling in Kubernetes
Experience implementing observability solutions using OpenTelemetry (OTEL)
Experience defining and operating with SLOs and SLIs
Hands-on experience with CI/CD pipelines and automated deployments
Strong troubleshooting and incident management experience

Benefits

Health: medical, dental, and vision
Time away: vacation and holidays
Development: Generous tuition reimbursement and access to internal professional development resources
Equal opportunity employer
#LI-Remote

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Google Kubernetes Engine (GKE)GitOpsOpenTelemetry (OTEL)CI/CDSLOsSLIsnetworkingresource allocationscalingobservability

Soft Skills

troubleshootingincident managementproactive identification of riskscollaboration with developersmonitoring improvements