
Senior Software Engineer, ML Operations
Best Egg
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $145,000 - $165,000 per year
Job Level
About the role
- Take ownership of an ML deployment system spanning multiple production environments and continue to research efficient and effective strategies.
- Improve, expand, and streamline our existing deployment pipelines to support faster deployments and automated model retraining.
- Collaborate with Data Scientists to understand model requirements and provide guidance to ensure seamless integration with production environments.
- Develop automations that empower data scientists to self-serve, remove manual steps from our processes, and streamline their training workflows.
- Build and maintain production-level inference environments, including low-latency real-time APIs and batch predictions, and monitor these environments to ensure uptime, resiliency, and latency SLAs are met.
- Work with modern CI/CD tools to deploy ML/AI models at scale in a production setting.
- Drive the deployment and optimization of custom AI and LLM models, supporting data scientists and AI engineers in fine-tuning, evaluating, and serving large language models for real-world use cases.
- Contribute to the infrastructure, pipelines, and monitoring needed for generative AI systems, including vector databases, prompt orchestration frameworks, and scalable inference services.
- Enjoy a great company culture rich in collaboration, teamwork, no politics, learning, and frequent wins.
Requirements
- At least five (5) years of professional engineering experience or work program equivalents in a relevant field.
- Experience in operationalization of Data Science projects (MLOps) on AWS; specific experience with EKS, Lambda, Step Functions, and SageMaker.
- Experience designing, building, and operating container-based cloud infrastructure with Terraform and other infrastructure-as-code tools in a production setting.
- Experience in CI/CD pipeline implementation; experience with ArgoCD, Argo Workflows, and GitHub Actions a plus.
- Proficiency in Python for both ML and general software engineering tasks; good knowledge of Bash and Unix command line tools.
- Extensive knowledge of the machine learning development lifecycle and associated tooling; demonstrated experience with Metaflow, Flyte, Kubeflow, etc.
- Demonstrated experience building production-grade, RESTful APIs for ML products; experience building data scientist tooling a plus.
- Hands-on experience with AI model development, fine-tuning, or deployment—particularly with large language models (e.g., OpenAI, Anthropic, Hugging Face, or custom transformer-based models).
- Knowledge of modern AI infrastructure tools such as vector databases (e.g., Pinecone, FAISS, or Weaviate), model-serving platforms, and prompt management frameworks.
- Ability to work in a fast-paced environment and strong technical communication skills.
- Enjoy a culture rich in direct communication, no politics, and continual learning—where we celebrate success and have fun too.
Benefits
- Pre-tax and post-tax retirement savings plans with a competitive company matching program
- Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays
- Multiple health care plans to choose from, including dental and vision options
- Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts
- Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs
- Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
MLOpsPythonBashUnix command lineCI/CDRESTful APIsAI model developmentlarge language modelsmachine learning development lifecycleinfrastructure-as-code
Soft skills
collaborationteamworktechnical communicationfast-paced environmentlearningproblem-solvingownershipguidancestreamlining processesempowerment