
Senior Site Reliability Engineer
Stellar Development Foundation
full-time
Posted on:
Location Type: Hybrid
Location: New York City • New York • United States
Visit company websiteExplore more
Salary
💰 $165,000 - $225,000 per year
Job Level
About the role
- Maintain, improve, scale and secure our AWS/GCP infrastructure and Linux systems.
- Assist our development teams in running, packaging, deploying and troubleshooting applications
- Work with developers on streamlining deployment processes with Jenkins and other CI/CD tooling.
- Build, maintain, monitor and improve our Kubernetes clusters.
- Work with development teams on migrating applications to Kubernetes.
- Be responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, ELK.
- Monitor, triage and respond to alerts in our high availability environments.
- Participate in design and code reviews, and ensure that the foundation for our services is best in class.
- Evaluate new technologies, design and implement as appropriate.
- Identify automation opportunities and implement by creating custom or by using off the shelf solutions.
Requirements
- 5+ years of experience of working in cloud-based systems operations, as a SRE or DevOps engineer.
- First-hand experience with configuration management and infrastructure as code (Ansible, Puppet, Terraform).
- Proficient in utilizing SRE methodologies like capacity planning and disaster recovery testing to ensure the scalability, resilience, and availability of critical services.
- A strong understanding of computer networking, TCP/UDP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.).
- Experienced in managing production workloads and skilled in using monitoring tools to detect issues early.
- Comfortable with participating in on-call rotations and conducting thorough root cause analyses to keep systems running smoothly.
- Proficiency in at least one programming language.
- Committed to supporting teammates, especially during challenging times, and excited about working in a close-knit, growing team. Approachable, empathetic, and proactive in promoting collaboration and innovation.
- Excels in working independently, demonstrating the ability to accomplish tasks without constant monitoring.
- Production experience building and maintaining Kubernetes clusters.
Benefits
- Competitive health, dental & vision coverage with most plans covered at 100% for the employee + any dependents
- Flexible time off + 15 company holidays including a company-wide holiday break
- Up to 12 weeks of paid parental leave for both non-birthing and birthing parents, as well as up to 14 weeks of paid pregnancy leave for birthing parents
- Gym reimbursement ($80 per month)
- Life & ADD (up to $50K)
- Short & Long term disability
- 401K with 4% match
- Health & Dependent Care FSA Accounts
- Commuter benefits with $250/month employer contribution
- Health Savings Account (HSA) with monthly employer contribution
- Family building benefits through Kindbody
- Wellbeing benefits (One Medical, Rightway, Headspace)
- L&D budget of $1,500/year
- Daily lunch and snacks in office
- Company retreats
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSGCPLinuxJenkinsKubernetesPrometheusELKAnsiblePuppetTerraform
Soft Skills
collaborationempathyproactivityindependenceproblem-solvingcommunicationteam supportroot cause analysiscapacity planningdisaster recovery