Restaurant365

Site Reliability Engineer II

Restaurant365

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $98,583 - $138,016 per year

About the role

  • The Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications.
  • Collaborate with DevOps, development, and infrastructure teams to resolve moderately complex issues, propose improvements, and strengthen the reliability, scalability, and security of our SaaS platform.
  • Respond to production incidents, perform triage and troubleshooting, and contribute to post-incident analysis.
  • Identify and automate manual processes to improve efficiency and reduce risk.
  • Enhance and evolve monitoring tools and platforms to improve observability.
  • Promote and apply best practices for reliability, scalability, and performance across engineering.
  • Implement and support cloud automation using Terraform, Ansible, or CloudFormation.
  • Work within change management protocols to provide maximum uptime for production systems.
  • Participate in on-call rotation, providing 24x7 support for incidents and contributing to root cause analysis.
  • Partner with developers, architects, vendors, and IT teams to ensure reliable system operations.
  • Research and remediate vulnerabilities in coordination with security teams.
  • Maintain documentation of infrastructure, monitoring, runbooks, and incident response procedures.

Requirements

  • BS in Computer Science, Information Systems, or related field (or equivalent experience).
  • 2–4 years of experience in site reliability engineering, DevOps, or cloud operations.
  • Experience with cloud platforms (Azure or AWS), including services such as AKS, ECS/EKS, Functions/Lambda, S3, and Blob storage.
  • Proficiency with infrastructure-as-code and automation (Terraform, Ansible, YAML, Python, Bash, PowerShell).
  • Strong Linux engineering skills; working knowledge of Windows administration.
  • Experience supporting production environments and participating in on-call rotations.
  • Familiarity with web servers and middleware (Nginx, Apache Tomcat).
  • Experience with CI/CD tools (GitLab, Git, or similar).
  • Strong written, oral, and interpersonal communication skills.
  • Preferred Qualifications
  • Experience with monitoring tools (Prometheus, Grafana, ELK, Site24x7, Nagios).
  • Knowledge of performance analysis and system vulnerability remediation.
  • Cloud certification (AWS or Azure) preferred.
  • Familiarity with restaurant industry SaaS platforms and customer-facing applications.
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
site reliability engineeringDevOpscloud operationsinfrastructure-as-codeautomationLinux engineeringWindows administrationperformance analysisvulnerability remediationmonitoring
Soft Skills
communicationinterpersonal skillstroubleshootingproblem-solvingcollaborationdocumentationincident responseroot cause analysisefficiency improvementchange management
Certifications
AWS certificationAzure certification