Tech Stack
AnsibleAWSChefCloudDistributed SystemsEC2FirewallsJenkinsKafkaKubernetesLinuxNode.jsPuppet
About the role
- Provide technical thought leadership in collaboration with Architecture and Engineering
- Design, implement, and maintain infrastructure as code (IaC) for development, UAT, and production environments (AI coding assistance such as Copilot may be leveraged)
- Design, implement, and maintain observability and monitoring solutions; monitor system health and participate in on-call rotation to maintain up-time and performance SLAs
- Design, implement, and maintain continuous integration (CI) and delivery (CD) systems and advise development teams to increase application resiliency and operability
- Manage containerized workloads; deploy application servers hosting APIs and Model Context Protocol (MCP) servers and clients
- Design, implement, and maintain Kubernetes clusters with advanced configuration and features
- Design, implement, and maintain Kafka clusters with advanced configuration and features
- Identify points of failure in infrastructure and recommend and implement solutions
- Report to DevOps Manager and collaborate with cross-functional engineering teams
Requirements
- Bachelor’s degree (B.A. or B.S.) preferable
- 3+ relevant experience in Dev Ops in a SaaS environment
- Expertise in distributed systems including load balancing, data storage, distributed messaging, and distributed databases
- Expertise in software configuration management (SCM) and tools like Chef, Puppet, Ansible or similar
- Expertise in cloud platforms, primarily AWS
- Familiarity with AWS PaaS tools: ECS, EKS, EC2, VPC, ELBs, S3, WAF, Parameter Store, SNS, SQS, SES, IAM, Lambda, CloudWatch, CloudFront
- Expertise in automated deployment methodologies that support CI/CD/CM using Git, Jenkins or GitHub Actions preferred
- Experience with managing infrastructure via IaC; usage of AI agent assistance (i.e. Copilot) is a strong plus
- Experience with source control systems (i.e. GitHub, Bitbucket) and IaC collaboration tools
- Experience configuring and administering Kubernetes clusters using advanced features including non-standard networking, storage classes/claims, name spacing, and multiple node groups
- Experience configuring and administering Kafka clusters including disaster recovery mechanisms, data loss prevention techniques, and determining topic configuration to right size for demanding and variable loads
- Experience with Windows and Linux systems and tools
- Prior professional application development experience with a particular focus on SaaS applications and web development
- Knowledge of IP networking, including TCP, UDP, firewalls, SSL
- Knowledge of software engineering and design principles such as design patterns, architectural patterns, CAP Theorem, event driven architecture
- Must be security-minded at all levels and adhere to the principle of least privilege
- Excellent communication and team collaboration skills with ability to approach and describe problems in a structured way and find reliable solutions