
Lead Site Reliability Engineer
Coupa Software
full-time
Posted on:
Location Type: Hybrid
Location: Bogota • 🇨🇴 Colombia
Visit company websiteJob Level
Senior
Tech Stack
AWSAzureChefCloudDNSGoogle Cloud PlatformKubernetesLinuxMicroservicesMySQLPythonTerraform
About the role
- Build, deploy, and troubleshoot microservices in Kubernetes and Amazon EKS, ensuring scalability and reliability.
- Design secure, highly available web applications with a focus on capacity planning and performance optimization.
- Deploy and manage the lifecycle of LLMs and embedding models, defining KPIs to measure and improve AI application performance.
- Evaluate and integrate emerging technologies such as RAG systems, MCP servers, AI Agents, and agentic workflows into our platform.
- Manage AWS core and GenAI services (S3, IAM, EKS, Bedrock, etc.) using infrastructure-as-code tools like Terraform and Chef, while maintaining observability through tools like New Relic or PagerDuty.
- Collaborate across product, platform, and engineering teams on architecture design, security patching, incident response, and release management to ensure the reliability of our ML and GenAI infrastructure
Requirements
- Bachelor’s degree and 10+ years of experience managing large-scale cloud applications with a strong background in Linux administration and troubleshooting. Excellent communication skills, a collaborative mindset, and the confidence to take ownership, drive solutions, and deliver results independently while thinking globally.
- Over 8 years of hands-on experience managing cloud infrastructure across AWS, GCP, and Azure environments.
- A solid understanding of today’s generative AI ecosystem, with practical experience using LLMs and embedding models (OpenAI, AWS Bedrock, SageMaker); familiarity with vector databases like LanceDB is a plus.
- Strong scripting skills in Bash or Python, and experience with container orchestration platforms like Amazon EKS or Azure AKS.
- Proficiency with DevOps and automation tools such as Chef, GitHub Actions, Rundeck, and IaC frameworks like Terraform, Spacelift, and Helm.
- Working knowledge of DNS, load balancers, and MySQL, along with a good grasp of source control and branching strategies in Git.
Benefits
- Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
- Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
- Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
microservicesKubernetesAmazon EKSLLMsembedding modelsTerraformChefBashPythonDevOps
Soft skills
communicationcollaborationownershipproblem-solvingindependenceglobal thinking
Certifications
Bachelor's degree