Senior Site Reliability Engineer

Viz.ai

Site Reliability Engineer enhancing reliability, scalability, and performance for AI-powered Care Pathways platform. Collaborate across teams and automate processes to support mission-critical systems.

Posted 4/27/2026full-timeTel Aviv • 🇮🇱 IsraelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSCloudEC2ElasticSearchGrafanaKubernetesLogstashPrometheusPythonTerraform

About the role

Key responsibilities & impact

Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
Use observability tools to analyze runtime behavior and make data-driven decisions that improve system performance and reliability.
Partner with development teams to integrate reliability best practices into the software development lifecycle.
Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes.
Optimize resource utilization to reduce costs while maintaining service quality.
Contribute to the development and adoption of AI-driven tools and practices for engineering and observability.

Requirements

What you’ll need

At least 6 years of experience as a SRE or DevOps.
Strong experience with Observability Tools such as OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
In-depth experience with Cloud Platforms: AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
Strong experience working in Kubernetes environments, with a focus on Helm for deployment and configuration management
Experience working with AI and LLM tools such as Cursor, Claude Code or similar.
Proficiency in scripting and/or development languages such as Bash or Python.
Thorough understanding of CI/CD pipelines and automation tools.
Strong experience with automation tools like Terraform and/or Ansible, and understanding of Infrastructure as Code.
Solid troubleshooting and debugging skills.
A team player with a strong can-do mentality.

Benefits

Comp & perks

medical
dental
vision
401(k)
generous vacation
performance-based bonuses
meals at the office

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

observabilityautomationmonitoringcapacity planningscriptingCI/CDinfrastructure as codetroubleshootingdebuggingresource optimization

Soft Skills

team playercan-do mentality