Senior Site Reliability Engineer, AI Research

Algolia

full-time

Posted on: 1/20/2026

Location Type: Remote

Location: Australia

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Cloud Go Google Cloud Platform Kubernetes Python Terraform

About the role

Support and evolve the reliability of platforms used by the AI Research team
Ensure production services meet expectations for availability, latency, and operational readiness
Design infrastructure and operational patterns that prioritize iteration speed while maintaining appropriate safeguards for production systems
Work closely with researchers and engineers in a cross-functional setting
Participate directly in team planning and execution, from early exploration through production rollout
Help researchers self-serve infrastructure safely and effectively
Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps
Own and improve CI/CD pipelines for services written primarily in Go
Design and operate observability systems using tools such as Datadog
Participate in an on-call rotation (relatively light)

Requirements

Strong experience operating cloud-first infrastructure
Hands-on experience running production services on Kubernetes
Proficiency with infrastructure-as-code (Terraform) and CI/CD systems
Experience supporting production services written in Go (Python experience is a plus)
Solid grounding in service reliability, incident response, and operational best practices
Comfort working in environments with ambiguity, where problems are not always well-defined upfront.

Benefits

Flexible workplace strategy

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesGCPinfrastructure-as-codeGitOpsCI/CDGoTerraformPythonservice reliabilityincident response

Soft Skills

cross-functional collaborationproblem-solvingadaptabilitycommunicationteam planningexecutionself-service supportoperational readinessiteration speedsafeguards