
Site Reliability Engineer
PayNearMe
full-time
Posted on:
Location Type: Remote
Location: Remote • California • 🇺🇸 United States
Visit company websiteSalary
💰 $175,000 - $195,000 per year
Job Level
Mid-LevelSenior
Tech Stack
AWSAzureCloudDockerEC2GoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonRubyRuby on RailsSplunkTerraform
About the role
- Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code
- Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker
- Develop and maintain comprehensive monitoring and observability solutions using Datadog
- Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs)
- Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence
- Ensure the reliability and stability of our production environments
- Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go
- Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI
- Assist in capacity planning and ensure that systems are scalable to meet future demands
- Implement security best practices and ensure compliance with industry standards
- Work closely with development teams to ensure reliability and scalability of new features and services
- Participate in an on-call rotation to address production issues and collaborate in incident response efforts
Requirements
- +3 years of experience in SRE, DevOps, or a related role
- Proficient with cloud platforms such as AWS, GCP, or Azure
- Experience with EC2, RDS, VPCs, and security groups is essential.
- Strong experience with Kubernetes and Docker, including deployment, scaling, and management of containerized applications
- Expert in using Terraform for infrastructure as code
- Extensive experience with monitoring and observability tools like Datadog, Prometheus, Grafana, ELK stack, or Splunk
- Proven ability to define, monitor, and maintain SLOs and SLAs to ensure reliable service delivery
- Strong skills in scripting languages like Python, Bash, or Go
- Familiarity with GitLab CI or similar tool for continuous integration and deployment
- Experience supporting production environments running Go or Ruby/Rails applications
- Deep understanding of DevOps principles, practices, and tools to drive continuous improvement in the software development lifecycle
- Excellent analytical and problem-solving skills to diagnose and resolve complex system issues quickly and effectively.
- Strong organizational skills, attention to detail, and the ability to work collaboratively in a team environment
- Excellent documentation skills to ensure accurate and detailed records.
Benefits
- 100% Remote (must be in US)
- Fast- paced and professional work culture
- Stock options with standard startup vesting - 1 year cliff; 4 years total
- $50 monthly communication expense stipend to go towards your phone/internet bill
- $250 stipend to enhance your WFH setup
- Reimbursement for peripheral equipment: monitor (up to $400), keyboard and mouse (up to $200)
- Premium medical benefits including vision and dental (100% coverage for employees)
- Company-sponsored life and disability insurance
- Paid parental bonding leave
- Paid sick leave, jury duty, bereavement
- 401k plan
- Flexible Time Off (our team members typically take off ~3-4 weeks per year)
- Volunteer Time Off
- 13 scheduled holidays
- 2x / year in-person team meet-ups (2-3 days, company paid)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
TerraformKubernetesDockerPythonBashGoGitLab CIDatadogPrometheusGrafana
Soft skills
analytical skillsproblem-solving skillsorganizational skillsattention to detailcollaborationdocumentation skills