
Senior DevOps Engineer
iCapital
full-time
Posted on:
Location Type: Hybrid
Location: New York City • New York • United States
Visit company websiteExplore more
Salary
💰 $180,000 - $230,000 per year
Job Level
About the role
- Design, build, and operate MLOps pipelines supporting the full ML lifecycle (training, validation, deployment, monitoring)
- Enable production workloads for AI/ML and Generative AI systems, including LLM‑based services
- Develop and maintain CI/CD pipelines for AI/ML services and supporting infrastructure
- Build and manage cloud‑native infrastructure on AWS, with heavy use of Kubernetes and containerized workloads
- Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform)
- Implement model versioning, experiment tracking, and artifact management across environments
- Ensure reliability, scalability, observability, and cost efficiency of AI platforms
- Partner with AI/ML engineers to operationalize models and standardize deployment patterns
- Implement monitoring and alerting for system health, model performance, and drift
- Enforce security, compliance, and governance requirements for AI workloads
- Participate in incident response, root cause analysis, and continuous improvement initiatives
- Document standards, best practices, and reference architectures for MLOps and AI infrastructure.
Requirements
- 15+ years of experience in DevOps, SRE, or Platform Engineering, with AWS as a primary cloud
- Experience supporting machine learning systems in production, including deployment and monitoring concerns
- Hands-on experience with machine learning platforms, particularly AWS SageMaker (required)
- Strong hands-on experience with Kubernetes, containerized workloads, and cloud networking
- Proven experience building and operating CI/CD pipelines (e.g., GitLab CI, ArgoCD)
- Strong proficiency with Terraform and scripting/programming in Python or similar languages
- Solid Linux, systems, and troubleshooting fundamentals
- Excellent communication skills and ability to work across teams
- Direct experience with MLOps platforms and tooling (model registries, experiment tracking, feature stores)
- Exposure to Generative AI / LLM workloads in production environments
- Familiarity with data stores commonly used in ML systems (e.g., Postgres, DynamoDB, object storage)
- Experience operating in regulated or fintech environments
- Background in cost optimization for compute‑intensive workloads
- Strong written and verbal communication skills
- AWS certifications are a plus.
Benefits
- Employer matched retirement plan
- Generously subsidized healthcare with 100% employer paid dental
- Vision coverage
- Telemedicine
- Virtual mental health counseling
- Parental leave
- Unlimited paid time off (PTO)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
MLOpsCI/CD pipelinesAWSKubernetesTerraformPythonLinuxmachine learningGenerative AImodel versioning
Soft Skills
communicationcollaborationproblem-solvingincident responseroot cause analysiscontinuous improvementdocumentationteamworkorganizational skillsadaptability
Certifications
AWS certifications