EXL

AWS Cloud DevOps

EXL

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $125,000 per year

About the role

  • Manage, scale, and optimize cloud environments used for data science workloads (primarily AWS, Databricks, dbt).
  • Provision, maintain, and optimize compute clusters for ML workloads (e.g., Kubernetes, ECS/EKS, Databricks, SageMaker).
  • Implement and maintain high-availability solutions for mission-critical analytics platforms.
  • Deep expertise in AWS resource management and provisioning, including IAM roles and permissions.
  • Develop CI/CD pipelines for model deployment, infrastructure-as-code (IaC), and automated testing using industry standard toolchains
  • Build monitoring, alerting, and logging systems for cloud and ML infrastructure (e.g., Datadog, CloudWatch, Prometheus, Grafana, ELK).
  • Automate provisioning, configuration, and deployments using tools such as Terraform and CloudFormation, GitHub actions, etc.
  • Enable and improve data ingestion, transformation, and model execution workflows through platform capabilities and automation.
  • Develop and maintain self-service capabilities for data scientists to provision and manage reliable, reproducible environments for research and development.
  • Collaborate with Data Engineering to maintain integrations between data pipelines and cloud systems.
  • Share responsibility for provisioning and operating application networking capabilities that support data platforms, including API gateways, CDNs, application load balancers, TLS, and WAFs.
  • Implement and operationalize data science security and compliance controls for data science platforms in alignment with enterprise cloud standards.
  • Conduct periodic risk assessments,best practice reviews, and remediation efforts to strengthen security and resiliency.
  • Support secure handling of sensitive financial data.
  • Partner with data scientists, machine learning engineers, and data engineers to deeply understand and support their needs and workflows within data-driven initiatives.
  • Serve as a technical advisor on cloud architecture, performance optimization, and production readiness for data and ML platforms.
  • Adopt and champion Agile, DevOps, and Platform Engineering practices (kanban, scrum, continuous improvement, automation, Everything-as-a-Service)
  • Demonstrate a strong, proactive focus on serving internal customers, prioritizing user experience, identifying opportunities to leverage automation and self-service to reduce toil and cognitive load for developers and researchers.

Requirements

  • A bachelor’s degree or higher in a STEM field, required
  • 5+ years of experience in cloud operations, DevOps, platform engineering, SRE, sysadmin or related roles.
  • Strong proficiency with at least one major cloud provider (AWS preferred).
  • Hands-on experience with IaC tools (Terraform, CloudFormation, or similar).
  • Strong scripting skills (Python, Bash, or PowerShell).
  • Strong understanding of modern authentication and authorization technologies and secrets management (IAM, OIDC, OAuth2, RBAC, ABAC, privileged access management, JIT authorization, PKI).
  • Experience with common CI/CD systems (GitHub Actions, Jenkins, GitLab CI, ArgoCD,, or similar).
  • Familiarity with container orchestration (Docker Compose, EKS/Kubernetes, ECS).
  • Experience supporting data-intensive or ML workloads.
Benefits
  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSDatabricksdbtKubernetesECSEKSSageMakerTerraformCloudFormationPython
Soft Skills
collaborationcustomer focusproactivecommunicationproblem-solvingprioritizationautomationcontinuous improvementuser experienceteamwork