ThousandEyes (part of Cisco)

Senior Site Reliability Engineer I, Datastores

ThousandEyes (part of Cisco)

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $152,500 - $219,200 per year

Job Level

Senior

Tech Stack

ApacheAWSCloudDynamoDBElasticSearchGoKafkaKubernetesLinuxMongoDBMySQLPythonTerraformUnix

About the role

  • Collaborate and work closely with the software engineers to ensure that the ThousandEyes platform datastores, infrastructure, and services are designed and optimized for availability, latency, and performance
  • Shown experience building and supporting critical services focusing on automation, availability, and performance
  • Design, implement, and maintain elastic and resilient datastores that support our growing platform on a multi-region scale
  • Drive and build automation wherever possible, enabling our datastores to scale effortlessly (self-service)
  • Participate in and contribute to improving our 24/7 incident response and on-call rotation
  • Handle the company’s core datastore services, maintaining a constantly growing infrastructure capable of processing meaningful incoming data daily
  • Responsible for availability, performance, change management, capacity planning, monitoring, and emergency response for platform datastores

Requirements

  • Ability to design and implement scalable and well-tested solutions, with a focus on datastores
  • Be proficient in writing high-quality code in Python or Go
  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes
  • Practical knowledge of cloud provider-led services (ideally AWS), and demonstrated in our context
  • Solid grasp of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols
  • Good communication and documentation skills
  • Experience running highly performant and highly available MySQL, MongoDB, DynamoDB, or Apache Druid databases
  • Background in software or operations with experience in designing, analyzing, and troubleshooting large-scale datastores systems