Design, implement, and manage reliable, scalable systems to support Fellow’s AI Meeting Assistant and other platform features
Optimize and maintain our AWS infrastructure, including EC2, RDS, and other cloud services
Oversee and optimize Kubernetes clusters to ensure high availability and performance
Enhance and maintain CI/CD pipelines to support efficient, high-quality deployments
Set up and improve monitoring, logging, and alerting systems to detect and resolve issues proactively
Work closely with the engineering, product, and QA teams to support feature development and deployment
Use tools like Pulumi to automate infrastructure provisioning and management
Lead root cause analysis and implement changes to prevent future incidents
Experiment with and adopt new technologies to enhance system performance and scalability

Requirements

2+ years of experience in site reliability engineering or a related field
Proficiency with Kubernetes, AWS, and databases
Experience with monitoring and observability tools such as Prometheus, Grafana, or Datadog
Familiarity with CI/CD tools like GitHub Actions, Jenkins, or GitLab CI
Strong problem-solving skills and proactive approach to reliability challenges
Excellent communication skills and ability to collaborate effectively in a team environment

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

site reliability engineeringAWSKubernetesCI/CDinfrastructure provisioningmonitoringloggingalertingdatabasesproblem-solving

Soft Skills

communicationcollaborationproactive approach