iCapital

Senior DevOps Engineer

iCapital

full-time

Posted on:

Location Type: Hybrid

Location: New York CityNew YorkUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $180,000 - $230,000 per year

Job Level

About the role

  • Design, build, and operate MLOps pipelines supporting the full ML lifecycle (training, validation, deployment, monitoring)
  • Enable production workloads for AI/ML and Generative AI systems, including LLM‑based services
  • Develop and maintain CI/CD pipelines for AI/ML services and supporting infrastructure
  • Build and manage cloud‑native infrastructure on AWS, with heavy use of Kubernetes and containerized workloads
  • Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform)
  • Implement model versioning, experiment tracking, and artifact management across environments
  • Ensure reliability, scalability, observability, and cost efficiency of AI platforms
  • Partner with AI/ML engineers to operationalize models and standardize deployment patterns
  • Implement monitoring and alerting for system health, model performance, and drift
  • Enforce security, compliance, and governance requirements for AI workloads
  • Participate in incident response, root cause analysis, and continuous improvement initiatives
  • Document standards, best practices, and reference architectures for MLOps and AI infrastructure.

Requirements

  • 15+ years of experience in DevOps, SRE, or Platform Engineering, with AWS as a primary cloud
  • Experience supporting machine learning systems in production, including deployment and monitoring concerns
  • Hands-on experience with machine learning platforms, particularly AWS SageMaker (required)
  • Strong hands-on experience with Kubernetes, containerized workloads, and cloud networking
  • Proven experience building and operating CI/CD pipelines (e.g., GitLab CI, ArgoCD)
  • Strong proficiency with Terraform and scripting/programming in Python or similar languages
  • Solid Linux, systems, and troubleshooting fundamentals
  • Excellent communication skills and ability to work across teams
  • Direct experience with MLOps platforms and tooling (model registries, experiment tracking, feature stores)
  • Exposure to Generative AI / LLM workloads in production environments
  • Familiarity with data stores commonly used in ML systems (e.g., Postgres, DynamoDB, object storage)
  • Experience operating in regulated or fintech environments
  • Background in cost optimization for compute‑intensive workloads
  • Strong written and verbal communication skills
  • AWS certifications are a plus.
Benefits
  • Employer matched retirement plan
  • Generously subsidized healthcare with 100% employer paid dental
  • Vision coverage
  • Telemedicine
  • Virtual mental health counseling
  • Parental leave
  • Unlimited paid time off (PTO)
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
MLOpsCI/CD pipelinesAWSKubernetesTerraformPythonLinuxmachine learningGenerative AImodel versioning
Soft Skills
communicationcollaborationproblem-solvingincident responseroot cause analysiscontinuous improvementdocumentationteamworkorganizational skillsadaptability
Certifications
AWS certifications