Best Egg

Senior Software Engineer, ML Operations

Best Egg

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $145,000 - $165,000 per year

Job Level

About the role

  • Take ownership of an ML deployment system spanning multiple production environments and continue to research efficient and effective strategies.
  • Improve, expand, and streamline our existing deployment pipelines to support faster deployments and automated model retraining.
  • Collaborate with Data Scientists to understand model requirements and provide guidance to ensure seamless integration with production environments.
  • Develop automations that empower data scientists to self-serve, remove manual steps from our processes, and streamline their training workflows.
  • Build and maintain production-level inference environments, including low-latency real-time APIs and batch predictions, and monitor these environments to ensure uptime, resiliency, and latency SLAs are met.
  • Work with modern CI/CD tools to deploy ML/AI models at scale in a production setting.
  • Drive the deployment and optimization of custom AI and LLM models, supporting data scientists and AI engineers in fine-tuning, evaluating, and serving large language models for real-world use cases.
  • Contribute to the infrastructure, pipelines, and monitoring needed for generative AI systems, including vector databases, prompt orchestration frameworks, and scalable inference services.
  • Enjoy a great company culture rich in collaboration, teamwork, no politics, learning, and frequent wins.

Requirements

  • At least five (5) years of professional engineering experience or work program equivalents in a relevant field.
  • Experience in operationalization of Data Science projects (MLOps) on AWS; specific experience with EKS, Lambda, Step Functions, and SageMaker.
  • Experience designing, building, and operating container-based cloud infrastructure with Terraform and other infrastructure-as-code tools in a production setting.
  • Experience in CI/CD pipeline implementation; experience with ArgoCD, Argo Workflows, and GitHub Actions a plus.
  • Proficiency in Python for both ML and general software engineering tasks; good knowledge of Bash and Unix command line tools.
  • Extensive knowledge of the machine learning development lifecycle and associated tooling; demonstrated experience with Metaflow, Flyte, Kubeflow, etc.
  • Demonstrated experience building production-grade, RESTful APIs for ML products; experience building data scientist tooling a plus.
  • Hands-on experience with AI model development, fine-tuning, or deployment—particularly with large language models (e.g., OpenAI, Anthropic, Hugging Face, or custom transformer-based models).
  • Knowledge of modern AI infrastructure tools such as vector databases (e.g., Pinecone, FAISS, or Weaviate), model-serving platforms, and prompt management frameworks.
  • Ability to work in a fast-paced environment and strong technical communication skills.
  • Enjoy a culture rich in direct communication, no politics, and continual learning—where we celebrate success and have fun too.
Benefits
  • Pre-tax and post-tax retirement savings plans with a competitive company matching program
  • Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays
  • Multiple health care plans to choose from, including dental and vision options
  • Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts
  • Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs
  • Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
MLOpsPythonBashUnix command lineCI/CDRESTful APIsAI model developmentlarge language modelsmachine learning development lifecycleinfrastructure-as-code
Soft skills
collaborationteamworktechnical communicationfast-paced environmentlearningproblem-solvingownershipguidancestreamlining processesempowerment