Design infrastructure and automated systems to support our distributed architecture
Build and Manage CI/CD pipelines and constantly improve their reliability & speed, and reduce lead time for changes.
Trace performance bottlenecks and identify optimizations and improvements at both the infrastructure and application level
Collaborate with our engineering team to meet high SLO and SLA requirements from customers
Maintain highly available web and backend systems that serve millions of users, and 1000’s of requests per second
Closely collaborating with Developers to setup, configure and plan the necessary cloud services in support of new feature development on AWS
Securing our infrastructure at both the cloud layer (IAM) and application layer (PKI)
Building and expanding monitoring and alerting systems for both infrastructure and business operations, using internal tools & integrating into established 3rd party SaaS ones.
Establishing comprehensive infrastructure-as-code coverage to support our entire platform
Develop tools to enhance and support Developer Productivity
Champion automation of manual processes and reducing operational overhead
Requirements
7+ years DevOps experience
5+ years database administration experience (Postgres, MariaDB, MSSQL)
4+ years experience orchestrating large scale distributed microservice deployments on Kubernetes and EC2.
4+ years experience building and managing EKS clusters and strong knowledge of the K8s ecosystem.
Demonstrable experience in infrastructure as code frameworks, in particular with Terraform, and configuration management using Ansible.
Production experience in creating and maintaining cloud environments in AWS, spanning across multiple geographic regions.
Consistent track record of building tooling, automation, and/or services in one or multiple languages (e.g., Go, Python, Bash or Typescript)
Experience debugging and optimizing systems and code
Automating deployment, scaling, and management of containerized applications with Docker or Kubernetes
Strong hands on experience with the AWS ecosystem
Familiarity with service mesh concepts and experience implementing one of the service mesh frameworks like Istio or Linkerd with Kubernetes clusters (circuit breaking, failure injection, retries and timeouts on the data-plane proxy level, canary deployments using weight based routing for pods and services, etc’).
Experience with Prometheus/Grafana/Cloudwatch metrics monitoring, ELK/OpenSearch stack for logging and PagerDuty and alerting.
Experience building, maintaining and scaling database technologies such as Postgresql, MySQL, Redis, and DynamoDB
Experience troubleshooting and autoscaling message queueing clusters in a high availability cluster such as RabbitMQ, SQS or Kafka
A passion for automation and integrating with build systems such as GitHub, GitLab CI.
Security centric in all approaches to design in infra as code as well as Docker build pipelines and microservice deployments.
Ability to quickly learn complex systems and environments, from the network through the application.
Benefits
Workday Bonus Plan
Role-specific commission/bonus
Annual refresh stock grants
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.