Character.AI

Staff Software Engineer, Site Reliability (SRE)

Character.AI

full-time

Posted on:

Location Type: Hybrid

Location: San Francisco • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $150,000 - $300,000 per year

Job Level

Lead

Tech Stack

CloudGoGoogle Cloud PlatformGrafanaKubernetesLinuxNode.jsPrometheusPythonSQLTerraform

About the role

  • Maintain production services and keep them operational.
  • Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.
  • Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.
  • Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.
  • Establish and support SLAs and SLOs for our site
  • Provide system monitoring and incident alerts
  • Participate in on-call rotations to provide support for critical incidents and outages.
  • Develop plans for site reliability and disaster recovery

Requirements

  • 5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale
  • Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang
  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base.
  • Experience working with multiple cloud computing platforms such as GCP is also a must
  • Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems
  • Experience with incident management and event postmortems
  • Outstanding candidates will have one or more of the following:
  • Familiarity with GPU clusters and/or HPC environments is preferred
  • Experience with monitoring and logging tools such as Prometheus and Grafana
  • Hands-on experience scaling a consumer product from early days into hypergrowth
Benefits
  • 🩺 Top-notch health coverage for you & your family, with majority of the premium covered
  • 💰 We invest in your future with a generous 401(K) contribution
  • 🍼 New parents, we've got you covered with incredible paid leave -up to 20 weeks
  • 🌴 4 weeks of PTO to explore, unwind & come back recharged
  • 🍽️ Daily in-office catering plus a monthly Doordash stipend to help keep you fueled no matter where you are**
  • ✨ Monthly wellness stipend to support you in your health journey

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonGolangSQLLinuxCI/CDKubernetesTerraformsite reliabilitydisaster recoveryautomation
Soft skills
collaborationtroubleshootingincident managementcommunicationproblem-solving