Writer

Site Reliability Engineer

Writer

full-time

Posted on:

Origin:  • 🇺🇸 United States • New York

Visit company website
AI Apply
Apply

Job Level

SeniorLead

Tech Stack

AWSAzureCloudDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesPrometheusPythonScalaTerraform

About the role

  • Lead the design, implementation, and maintenance of WRITER, Inc.’s cloud infrastructure to ensure high availability and performance
  • Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers
  • Automate infrastructure provisioning and management using Terraform & Python
  • Collaborate with development teams to optimize cloud resources and enhance system reliability
  • Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutions
  • Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures
  • Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency
  • Ensure the security and compliance of our systems, adhering to industry standards and regulations
  • Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement
  • Stay current with emerging technologies and industry trends to continuously improve our site reliability practices

Requirements

  • Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experience
  • Deep understanding of system architecture and infrastructure design to ensure high availability and performance
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field
  • Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring
  • Experience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systems
  • Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools
  • Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance
  • Ability to lead and mentor junior engineers in best practices for reliability and system optimization
  • Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders
  • Proactive approach to identifying and mitigating potential system failures and performance bottlenecks
  • Software engineering expertise (preferred)
  • Terraform (preferred)
  • Python (preferred)
  • Kubernetes (preferred)
  • Scala (preferred)
  • AWS/GCP (preferred)
  • Applicants must answer work authorization questions on the application form (are you legally authorized to work in the country where the job is located; will you require visa sponsorship?)
  • Applicants must confirm they are at least 18 years of age and able to attend in-person collaborative sessions in office 2-3 days/week