Coupa Software

Lead Site Reliability Engineer

Coupa Software

full-time

Posted on:

Location Type: Hybrid

Location: Bogota • 🇨🇴 Colombia

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSAzureChefCloudDNSGoogle Cloud PlatformKubernetesLinuxMicroservicesMySQLPythonTerraform

About the role

  • Build, deploy, and troubleshoot microservices in Kubernetes and Amazon EKS, ensuring scalability and reliability.
  • Design secure, highly available web applications with a focus on capacity planning and performance optimization.
  • Deploy and manage the lifecycle of LLMs and embedding models, defining KPIs to measure and improve AI application performance.
  • Evaluate and integrate emerging technologies such as RAG systems, MCP servers, AI Agents, and agentic workflows into our platform.
  • Manage AWS core and GenAI services (S3, IAM, EKS, Bedrock, etc.) using infrastructure-as-code tools like Terraform and Chef, while maintaining observability through tools like New Relic or PagerDuty.
  • Collaborate across product, platform, and engineering teams on architecture design, security patching, incident response, and release management to ensure the reliability of our ML and GenAI infrastructure

Requirements

  • Bachelor’s degree and 10+ years of experience managing large-scale cloud applications with a strong background in Linux administration and troubleshooting. Excellent communication skills, a collaborative mindset, and the confidence to take ownership, drive solutions, and deliver results independently while thinking globally.
  • Over 8 years of hands-on experience managing cloud infrastructure across AWS, GCP, and Azure environments.
  • A solid understanding of today’s generative AI ecosystem, with practical experience using LLMs and embedding models (OpenAI, AWS Bedrock, SageMaker); familiarity with vector databases like LanceDB is a plus.
  • Strong scripting skills in Bash or Python, and experience with container orchestration platforms like Amazon EKS or Azure AKS.
  • Proficiency with DevOps and automation tools such as Chef, GitHub Actions, Rundeck, and IaC frameworks like Terraform, Spacelift, and Helm.
  • Working knowledge of DNS, load balancers, and MySQL, along with a good grasp of source control and branching strategies in Git.
Benefits
  • Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
  • Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
  • Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
microservicesKubernetesAmazon EKSLLMsembedding modelsTerraformChefBashPythonDevOps
Soft skills
communicationcollaborationownershipproblem-solvingindependenceglobal thinking
Certifications
Bachelor's degree