
AWS Cloud DevOps
EXL
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $125,000 per year
About the role
- Manage, scale, and optimize cloud environments used for data science workloads (primarily AWS, Databricks, dbt).
- Provision, maintain, and optimize compute clusters for ML workloads (e.g., Kubernetes, ECS/EKS, Databricks, SageMaker).
- Implement and maintain high-availability solutions for mission-critical analytics platforms.
- Deep expertise in AWS resource management and provisioning, including IAM roles and permissions.
- Develop CI/CD pipelines for model deployment, infrastructure-as-code (IaC), and automated testing using industry standard toolchains
- Build monitoring, alerting, and logging systems for cloud and ML infrastructure (e.g., Datadog, CloudWatch, Prometheus, Grafana, ELK).
- Automate provisioning, configuration, and deployments using tools such as Terraform and CloudFormation, GitHub actions, etc.
- Enable and improve data ingestion, transformation, and model execution workflows through platform capabilities and automation.
- Develop and maintain self-service capabilities for data scientists to provision and manage reliable, reproducible environments for research and development.
- Collaborate with Data Engineering to maintain integrations between data pipelines and cloud systems.
- Share responsibility for provisioning and operating application networking capabilities that support data platforms, including API gateways, CDNs, application load balancers, TLS, and WAFs.
- Implement and operationalize data science security and compliance controls for data science platforms in alignment with enterprise cloud standards.
- Conduct periodic risk assessments,best practice reviews, and remediation efforts to strengthen security and resiliency.
- Support secure handling of sensitive financial data.
- Partner with data scientists, machine learning engineers, and data engineers to deeply understand and support their needs and workflows within data-driven initiatives.
- Serve as a technical advisor on cloud architecture, performance optimization, and production readiness for data and ML platforms.
- Adopt and champion Agile, DevOps, and Platform Engineering practices (kanban, scrum, continuous improvement, automation, Everything-as-a-Service)
- Demonstrate a strong, proactive focus on serving internal customers, prioritizing user experience, identifying opportunities to leverage automation and self-service to reduce toil and cognitive load for developers and researchers.
Requirements
- A bachelor’s degree or higher in a STEM field, required
- 5+ years of experience in cloud operations, DevOps, platform engineering, SRE, sysadmin or related roles.
- Strong proficiency with at least one major cloud provider (AWS preferred).
- Hands-on experience with IaC tools (Terraform, CloudFormation, or similar).
- Strong scripting skills (Python, Bash, or PowerShell).
- Strong understanding of modern authentication and authorization technologies and secrets management (IAM, OIDC, OAuth2, RBAC, ABAC, privileged access management, JIT authorization, PKI).
- Experience with common CI/CD systems (GitHub Actions, Jenkins, GitLab CI, ArgoCD,, or similar).
- Familiarity with container orchestration (Docker Compose, EKS/Kubernetes, ECS).
- Experience supporting data-intensive or ML workloads.
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSDatabricksdbtKubernetesECSEKSSageMakerTerraformCloudFormationPython
Soft Skills
collaborationcustomer focusproactivecommunicationproblem-solvingprioritizationautomationcontinuous improvementuser experienceteamwork