Best Egg

Lead Software Engineer II, AI Operations

Best Egg

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $170,000 per year

Job Level

About the role

  • Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one.
  • Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics.
  • Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement, automated provisioning, and resource-usage governance as core platform capabilities.
  • Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests.
  • Enforce PII redaction, safety filters, role-based access, audit logs, and human-in-the-loop review paths to control quality and risk.
  • Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys with automatic rollback triggers.
  • Cut run-rate spend through caching, truncation, batching, autoscaling, and model routing; establish clear unit economics per workflow.
  • Provide templates, SDKs, and high-quality abstractions that let product teams ship safely without bespoke plumbing; improve developer experience.
  • Build primarily in Python and Metaflow (Outerbounds); deploy on AWS (Bedrock + core services) and OpenAI; use Cursor in daily workflows; help evaluate and, when appropriate, run on Databricks.
  • Participate in on-call, author runbooks, and remove single-thread risk for AI services; drive reliability and resilience akin to ML Ops.

Requirements

  • 5–10 years of professional software engineering (or equivalent) with 2+ years building AI/LLM applications; portfolio of shipped AI projects (links to code, demos, or case studies).
  • Demonstrated passion for relentless exploration of the latest AI models, frameworks, and tooling, ensuring constant adoption of state-of-the-art innovations in the workflow.
  • Hands-on with some/all of OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi-turn orchestration, streaming, and prompt/version management.
  • Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking), integration with vector database, and measuring retrieval quality.
  • Comfortable building APIs/services and simple UIs where needed; strong fundamentals in Python and modern packaging/testing.
  • CI/CD, containers, cloud fundamentals (AWS), and runtime performance tuning; experience operating services in production.
  • Metaflow (Outerbounds) preferred; Databricks familiarity is a plus; ability to integrate data/feature pipelines and schedule/operate flows.
  • Tracing and logging, expertise in tools like Datadog, Dynatrace or Grafana where relevant for AI monitoring is essential.
  • Comfortable optimizing latency/throughput/cost, and implementing guardrails for PII/safety/compliance.
  • Partner effectively with data scientists, analysts, and engineers; promote best practices and high-leverage abstractions.
  • Fine-tuning or distillation experience; Kubernetes or FastAPI exposure; familiarity with Snowflake or similar warehousing for retrieval sources.
Benefits
  • Pre-tax and post-tax retirement savings plans with a competitive company matching program
  • Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays
  • Multiple health care plans to choose from, including dental and vision options
  • Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts
  • Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs
  • Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonMetaflowAWSOpenAIHuggingfaceretrieval systemsCI/CDcontainersKubernetesFastAPI
Soft Skills
collaborationproblem-solvingcommunicationreliabilityresilienceexplorationbest practicesdeveloper experiencepartneringoptimizing