SupplyHouse.com

Site Reliability Engineer

SupplyHouse.com

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇮🇳 India

Visit company website
AI Apply
Apply

Salary

💰 $29,000 - $36,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleCloudDockerGoGoogle Cloud PlatformGrafanaJenkinsKubernetesLinuxPrometheusPythonSQLTerraformUnix

About the role

  • Ensure the scalability, reliability, and performance of our infrastructure and applications with a focus on automation, monitoring, and incident response
  • Design, build, and maintain scalable, reliable systems on GCP (Compute Engine, GKE, Cloud Storage, Cloud SQL)
  • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager
  • Build and maintain observability platforms (monitoring, logging, tracing) using tools such as Stackdriver (Cloud Monitoring), Prometheus, or Grafana
  • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence
  • Partner with DevOps and engineering teams to enhance CI/CD pipelines for resilient deployments
  • Define and monitor SLAs, SLOs, and SLIs to ensure application availability and performance
  • Implement disaster recovery (DR) and backup strategies across cloud services
  • Continuously optimize performance, capacity, and cost-efficiency of GCP resources

Requirements

  • Bachelors degree in Computer Science, Engineering, or a related field
  • 3+ years of hands-on experience as a Site Reliability Engineer, DevOps Engineer, Systems Engineer, or Cloud Infrastructure Engineer. Proven track record managing production-grade systems on Google Cloud Platform (GCP) or other cloud providers
  • Strong understanding of Linux/Unix system administration, networking, and troubleshooting.
  • Experience implementing Infrastructure as Code (IaC) using tools like Terraform, Ansible, or Deployment Manager
  • Familiarity with containerization and orchestration technologies such as Docker and Kubernetes (GKE)
  • Experience with monitoring and observability tools (Google Cloud Operations Suite, Prometheus, Grafana, Datadog, ELK).
  • Experience defining and monitoring SLAs, SLOs, and SLIs to ensure application uptime and performance.
  • Proven ability to handle incident response, conduct postmortems, and drive root cause analysis
  • Proficiency in at least one scripting language (Python, Bash, or Go) for automation and tooling. Hands-on experience building or managing CI/CD pipelines (Jenkins, GitLab CI, Cloud Build).
  • Strong background in configuration management and release automation
  • Knowledge of IAM (Identity and Access Management), network security, and cloud compliance controls. Familiarity with disaster recovery (DR), backups, and high-availability design
Benefits
  • Comprehensive and affordable medical, dental, vision, and life insurance options
  • Competitive Provident Fund contributions
  • Paid time off and holidays
  • Mental health support and wellbeing program
  • Company-provided equipment and one-time $250 USD work from home stipend
  • $750 USD annual professional development budget
  • Company rewards and recognition program
  • And more!

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Google Cloud Platform (GCP)TerraformAnsibleDeployment ManagerLinux/Unix system administrationDockerKubernetesPythonBashGo
Soft skills
incident responseroot cause analysiscollaborationproblem-solvingcommunication
Certifications
Bachelor's degree in Computer ScienceBachelor's degree in Engineering