Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Versana

SRE/DevOps Engineer

Versana

SRE/DevOps Engineer at Versana improving cloud observability and efficiency in loan market technologies. Collaborating with teams to enhance system reliability and monitoring practices.

Posted 5/23/2026full-timeNew York City • New York • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDockerElasticSearchGoogle Cloud PlatformGrafanaJenkinsKafkaKubernetesLinuxTerraform

About the role

Key responsibilities & impact
  • Design, implement and enhance system observability and monitoring tools
  • Monitor system performance, create incident response plans, and implement observability practices to gain insights into system behavior.
  • Implement and monitor service-level objectives (SLOs) and indicators.
  • Improve system reliability and resiliency.
  • Conduct post-incident reviews and implement necessary changes to prevent system failures.
  • Assist teams in implementing observability tools and leveraging available telemetry data to troubleshoot and resolve incidents and problems.
  • Leverage observability and event management to improve key incident management metrics, such as mean time to detect and mean time to restore services.
  • Continually optimize systems and workflows by improving architecture, infrastructure, automation, CI/CD, and observability.
  • Collaborate with developers to ensure applications are designed with DevOps best practices in mind.
  • Participate in a rotating on-call schedule for weekend releases and being available to respond to production issues outside of regular working hours, including weekends and holidays.

Requirements

What you’ll need
  • 5+ years of experience as a Site Reliability Engineer or similar role.
  • 3+ years of work experience with public cloud (Azure, AWS or GCP).
  • 3+ years of direct experience with observability tools like Datadog, Elasticsearch, and Grafana Labs, etc.
  • 3+ years of experience with containerization and orchestration technologies like Docker and Kubernetes.
  • 2+ years of experience in development and management of CI/CD pipelines (e.g., Azure DevOps, Gitlab CI/CD, Github Actions, Jenkins, etc).
  • 2+ years of experience with Infrastructure-as-code tools like Terraform, Azure Bicep, Cloud Formation, etc.
  • 1+ years of experience with site reliability tools like Gremlin, Chaos Mesh, or similar.
  • Proven track record leveraging core observability concepts, end-user monitoring, and infrastructure monitoring with SaaS solutions.
  • Experience with messaging services like Kafka or Azure Event Hubs.
  • Good understanding of the Linux operating system.

Benefits

Comp & perks
  • Equal Opportunity Employer
  • Health insurance
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
system observabilitymonitoring toolsservice-level objectivessystem reliabilityincident response plansCI/CD pipelinesInfrastructure-as-codecontainerizationorchestration technologiesend-user monitoring
Soft Skills
collaborationtroubleshootingincident managementoptimizationcommunication