Tech Stack
AWSCloudKubernetesTerraform
About the role
- Solve performance and scalability challenges for LLM-powered enterprise apps
- Own and evolve CI/CD pipelines and automation for smooth deployments
- Define and maintain monitoring and alerting strategies for production health
- Work cross-functionally on firefighting, GitHub workflows, and system debugging
- Configure infrastructure for both SaaS and on-prem enterprise deployments
- Collaborate with customer infrastructure teams and internal stakeholders
- Drive best practices for security, scalability, and system reliability
Requirements
- 5+ years managing large-scale production infrastructure
- Strong experience with service-oriented architecture design
- Proficient in AWS, Kubernetes, GitOps (ArgoCD), and CI/CD practices
- Hands-on with Infrastructure as Code (Terraform, Crossplane preferred)
- Experience supporting customer-hosted deployments (private cloud/on-prem)
- Solid troubleshooting skills, ownership mindset, and pragmatic approach
- Interest in Machine Learning / LLMs—or motivation to learn quickly