Automate and build tools to eliminate repetitive operational tasks and reduce toil
Maintain and scale reliable software applications using DevOps best practices
Build and enhance CI/CD pipelines for automated testing, builds, and deployments
Optimize and maintain Kubernetes-based orchestration systems for performance and reliability
Troubleshoot complex production issues across application, infrastructure, and distributed system layers
Participate in on-call rotations and support incident response
Mentor junior engineers in software development and operational best practices
Collaborate with stakeholders and product teams on infrastructure and deployment requirements
Ensure compliance with government cloud standards across applications and infrastructure
Requirements
Proven ability to maintain 99.99% uptime in production environments
10+ years of overall experience, including 6+ years in software development and 3+ years in DevOps practices.
3+ years of experience with Kubernetes, Terraform, Python or Go, and AWS
4+ years of experience working with distributed systems
Familiarity with Redis, Kafka/PubSub, and relational databases
Experience in fast-paced or startup-like environments
Strong collaboration and communication skills across cross-functional teams and divisions
Ability to ramp up quickly and contribute in complex, large-scale environments
Demonstrated leadership in incident management and operational reliability.
Benefits
At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
DevOpsCI/CDKubernetesTerraformPythonGoAWSRedisKafkadistributed systems