Red Cell Partners

Principal MLOps Engineer

Red Cell Partners

full-time

Posted on:

Location Type: Remote

Location: Remote • Virginia, Washington • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $200,000 - $250,000 per year

Job Level

Lead

Tech Stack

AWSAzureCloudDockerGoogle Cloud PlatformJenkinsKubernetesPythonTerraform

About the role

  • Own the technical vision, strategy, and end-to-end architecture for Trase’s MLOps platform, ensuring scalability, reliability, security, and cost-efficiency.
  • Architect and build a sophisticated CI/CD/CT ecosystem to automate the entire ML lifecycle, from data validation to production monitoring.
  • Lead the design of scalable and resilient ML infrastructure using IaC (Terraform) and container orchestration (Kubernetes) on a major cloud platform.
  • Establish MLOps best practices, including frameworks for version control, experiment tracking, model governance, and responsible AI.
  • Implement a robust monitoring and alerting framework to track model performance, detect drift, and ensure the reliability of production ML services.
  • Serve as the organization's thought leader on MLOps, mentoring engineers, and driving cross-functional alignment on platform strategy and best practices.
  • Define the multi-year roadmap for Trase’s MLOps ecosystem in alignment with business and product strategy.
  • Anticipate emerging trends (LLMOps, autoML, multi-cloud, federated learning) and guide the org to adopt them proactively.
  • Define patterns for operating large-scale LLMs and multi-modal AI in production with efficiency and compliance.
  • Solve highly ambiguous, large-scale ML deployment challenges where no precedent exists, defining best practices for the org.
  • Focus on model training, pipeline development, and fine-tuning of large language models (LLMs) to ensure peak performance.
  • Some travel is required.

Requirements

  • 10+ years in software/infrastructure engineering, with 5+ years in a senior/lead MLOps, ML Infrastructure, or Platform role.
  • Expertise in designing and operating scalable, production-grade ML systems on AWS, GCP, or Azure.
  • Mastery of Docker and Kubernetes for managing production ML workloads.
  • Proven experience managing complex infrastructure as code (IaC) with tools like Terraform.
  • Deep experience architecting CI/CD/CT pipelines for complex ML workflows (e.g., GitHub Actions, Jenkins).
  • Strong Python programming skills for infrastructure automation, tooling, and services.
  • Experience architecting solutions across the full ML lifecycle, from experiment tracking to advanced deployment patterns and monitoring.
  • Exceptional communication skills to articulate complex architectural strategy to stakeholders at all levels.
  • Familiarity with modern MLOps tools like MLflow, Kubeflow, SageMaker, or Vertex AI.
  • Experience with the operational challenges of LLMs, including fine-tuning pipelines, RAG systems, and vector databases.
Benefits
  • 100% employer-paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning with educational reimbursements.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits through TARA Mind.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
MLOpsML infrastructureCI/CDIaCTerraformKubernetesPythonDockerML lifecycleLLMs
Soft skills
communicationmentoringleadershipcross-functional alignmentstrategic thinkingproblem-solvingadaptabilitycollaborationstakeholder engagementthought leadership
Arize AI

AI Engineer – Instrumentation

Arize AI
Mid · Seniorfull-time$125k–$225k / year🇺🇸 United States
Posted: 39 minutes agoSource: boards.greenhouse.io
PythonTypeScript
Lime

Senior ML Engineer

Lime
Seniorfull-time$165k–$227k / year🇺🇸 United States
Posted: 1 hour agoSource: jobs.ashbyhq.com
PandasPythonPyTorchSparkSQLTensorflow
Medical Review Institute of America, LLC

AI Developer, Healthcare

Medical Review Institute of America, LLC
Mid · Seniorfull-time🇺🇸 United States
Posted: 2 hours agoSource: www.comeet.com
AWSAzureCloudDockerGoogle Cloud PlatformJavaScriptKubernetesNode.jsPythonPyTorchScikit-LearnTensorflow
Oliv - Your AI Assistant for Sales

Product/AI Engineer Intern

Oliv - Your AI Assistant for Sales
Entryinternship🇺🇸 United States
Posted: 1 day agoSource: boards.greenhouse.io
JavaScriptPython