
Site Reliability Engineer II
Restaurant365
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $98,583 - $138,016 per year
About the role
- The Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications.
- Collaborate with DevOps, development, and infrastructure teams to resolve moderately complex issues, propose improvements, and strengthen the reliability, scalability, and security of our SaaS platform.
- Respond to production incidents, perform triage and troubleshooting, and contribute to post-incident analysis.
- Identify and automate manual processes to improve efficiency and reduce risk.
- Enhance and evolve monitoring tools and platforms to improve observability.
- Promote and apply best practices for reliability, scalability, and performance across engineering.
- Implement and support cloud automation using Terraform, Ansible, or CloudFormation.
- Work within change management protocols to provide maximum uptime for production systems.
- Participate in on-call rotation, providing 24x7 support for incidents and contributing to root cause analysis.
- Partner with developers, architects, vendors, and IT teams to ensure reliable system operations.
- Research and remediate vulnerabilities in coordination with security teams.
- Maintain documentation of infrastructure, monitoring, runbooks, and incident response procedures.
Requirements
- BS in Computer Science, Information Systems, or related field (or equivalent experience).
- 2–4 years of experience in site reliability engineering, DevOps, or cloud operations.
- Experience with cloud platforms (Azure or AWS), including services such as AKS, ECS/EKS, Functions/Lambda, S3, and Blob storage.
- Proficiency with infrastructure-as-code and automation (Terraform, Ansible, YAML, Python, Bash, PowerShell).
- Strong Linux engineering skills; working knowledge of Windows administration.
- Experience supporting production environments and participating in on-call rotations.
- Familiarity with web servers and middleware (Nginx, Apache Tomcat).
- Experience with CI/CD tools (GitLab, Git, or similar).
- Strong written, oral, and interpersonal communication skills.
- Preferred Qualifications
- Experience with monitoring tools (Prometheus, Grafana, ELK, Site24x7, Nagios).
- Knowledge of performance analysis and system vulnerability remediation.
- Cloud certification (AWS or Azure) preferred.
- Familiarity with restaurant industry SaaS platforms and customer-facing applications.
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
site reliability engineeringDevOpscloud operationsinfrastructure-as-codeautomationLinux engineeringWindows administrationperformance analysisvulnerability remediationmonitoring
Soft Skills
communicationinterpersonal skillstroubleshootingproblem-solvingcollaborationdocumentationincident responseroot cause analysisefficiency improvementchange management
Certifications
AWS certificationAzure certification