
Site Reliability Engineer
Writer
full-time
Posted on:
Location Type: Hybrid
Location: New York City • New York • United States
Visit company websiteExplore more
Salary
💰 $157,700 - $277,800 per year
Tech Stack
About the role
- Automate operational tasks and infrastructure management by developing robust tools and platforms using Python, Go, or similar languages, significantly reducing manual toil across our production environment
- Design and implement scalable, fault-tolerant infrastructure solutions on public cloud providers (AWS, GCP, Azure) to support WRITER's rapidly expanding, high-traffic AI platform
- Own the reliability, performance, and efficiency of WRITER’s core services, defining and upholding stringent Service Level Objectives (SLOs) and Error Budgets
- Own the observability stack for monitoring, logging, and alerting systems to ensure rapid detection of issues across our complex distributed systems
- Lead incident response, post-mortems, and root cause analyses, applying learnings to proactively prevent future outages and build a more resilient system architecture
- Collaborate closely with product and engineering teams, providing expert guidance on system design for reliability, performance, and scalability from conception through launch
Requirements
- A solid 7+ years of experience in site reliability engineering, DevOps, or a similar role focused on building and operating large-scale, high-availability production systems
- Deep expertise with cloud platforms (AWS strongly preferred), containerization technologies like Docker and Kubernetes, and Infrastructure-as-Code tools such as Terraform
- Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring
- Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance
- Demonstrated ability to Challenge the status quo, proactively identify systemic weaknesses, and propose innovative solutions to complex reliability problems
- Excellent communication, collaboration, and problem-solving skills, with a talent for building strong relationships and Connecting with cross-functional teams
- A strong sense of ownership and accountability, eager to Own mission-critical systems and drive them toward peak performance and unparalleled reliability
Benefits
- Generous PTO, plus company holidays
- Medical, dental, and vision coverage for you and your family
- Paid parental leave for all parents (12 weeks)
- Fertility and family planning support
- Early-detection cancer testing through Galleri
- Flexible spending account and dependent FSA options
- Health savings account for eligible plans with company contribution
- Annual work-life stipends for:
- Wellness stipend for gym, massage/chiropractor, personal training, etc.
- Learning and development stipend
- Company-wide off-sites and team off-sites
- Competitive compensation, company stock options and 401k
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGoJavasite reliability engineeringDevOpscloud platformscontainerizationInfrastructure-as-Codemonitoringautomation
Soft Skills
communicationcollaborationproblem-solvingownershipaccountabilityinnovationrelationship buildingcross-functional teamworkproactive identification of issuesresilience