Veza

Senior Site Reliability Engineer, SRE

Veza

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $154,000 - $210,000 per year

Job Level

Senior

Tech Stack

AWSCloudGrafanaKubernetesLinuxPrometheusTerraform

About the role

  • Deploy software for Cloud Prem and SAAS customers.
  • Respond to and diagnose system incidents in a timely and efficient manner, minimizing downtime and impact on users.
  • Collaborate with other engineers to establish root causes and implement effective resolutions.
  • Continuously improve incident response processes and documentation for future occurrences.
  • Proactively monitor and maintain the health and performance of our infrastructure and services.
  • Perform routine administrative tasks such as system configuration, user management, and data backups.
  • Identify and implement operational improvements to ensure ongoing system reliability and efficiency.
  • Develop and implement scripts and automated solutions to streamline operational tasks and reduce manual workload.
  • Participate in the on-call rotation to address critical incidents outside of regular business hours.
  • Ensure effective handoff between on-call engineers and document post-incident information for future reference.
  • Document processes for support and create, maintain and execute run-books for identified situations
  • Provide tier 2/3 technical support to customers experiencing platform issues or requiring advanced troubleshooting
  • Work directly with customer technical teams to resolve complex deployment, configuration, and integration challenges
  • Conduct technical onboarding sessions and provide guidance on best practices for customer implementations
  • Collaborate with customer success teams to ensure smooth customer experiences and rapid issue resolution
  • Create and maintain customer-facing technical documentation, troubleshooting guides, and knowledge base articles
  • Escalate customer feedback and feature requests to product and engineering teams
  • Participate in customer calls and technical discussions to provide expert-level platform guidance
  • Track and analyze customer support metrics to identify trends and areas for improvement

Requirements

  • BS degree in Computer Science or related field
  • 3+ years of experience in Site Reliability Engineering
  • 2+ years experience working with cloud platform and cloud automation tools especially in AWS
  • Strong experience with Kubernetes, Linux, AWS networking(VPC) and Terraform
  • Experience with the GitOps model for deployment
  • Familiarity with distributed version control
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana)
  • Bazel and Helm experience a plus
  • Understanding of software configuration best practices
  • Ability to wear multiple hats in a fast-paced environment
  • Hands-on, “can do” attitude and a bias for action
  • Low ego and high intellectual curiosity
  • Comfortable working across time zones to support global customer base
  • Excellent communication skills with ability to explain technical concepts to both technical and non-technical audiences
  • Strong customer service orientation with patience and empathy when working with frustrated customers
Benefits
  • Competitive salary
  • Equity
  • Health insurance
  • Paid time off
  • Flexible working hours

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Site Reliability EngineeringAWSKubernetesLinuxTerraformGitOpsPrometheusGrafanaBazelHelm
Soft skills
communication skillscustomer service orientationpatienceempathyability to work in fast-paced environmentintellectual curiositycollaborationproblem-solvingadaptabilityattention to detail
Certifications
BS degree in Computer Science
CrowdStrike

Senior Engineering Manager – SRE

CrowdStrike
Seniorfull-time$160k–$250k / year🇺🇸 United States
Posted: 56 minutes agoSource: crowdstrike.wd5.myworkdayjobs.com
AWSAzureCloudDistributed SystemsGoogle Cloud Platform
Cognyte

Telecom Deployment Engineer

Cognyte
Mid · Seniorfull-time$100k–$120k / year🇺🇸 United States
Posted: 2 hours agoSource: www.comeet.com
Catio

Senior SRE

Catio
Seniorfull-time🇺🇸 United States
Posted: 4 hours agoSource: jobs.ashbyhq.com
AWSCloudGrafanaKubernetesPrometheusSplunkTerraform
Hypergiant

Intermediate DevOps Engineer

Hypergiant
Mid · Seniorfull-time$113k–$136k / year🇺🇸 United States
Posted: 6 hours agoSource: boards.greenhouse.io
AnsibleAWSCloudDockerFluxGoogle Cloud PlatformJavaScriptKubernetesNode.jsReactTerraformTypeScript