Albert Invent

Staff ML Ops Engineer

Albert Invent

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads
  • Manage containerized services, autoscaling, networking, and resource optimization
  • Design and build high-performance Python APIs and services using FastAPI or similar frameworks
  • Architect backend systems for scalability, reliability, and low latency
  • Build integrations between AI/ML systems and the broader Albert platform
  • Build and operate distributed systems that handle compute-intensive and high-throughput workloads
  • Design for fault tolerance, graceful degradation, and horizontal scalability
  • Implement async workflows, job queues, and task orchestration as needed
  • Architect and maintain data pipelines and storage systems supporting AI/ML workflows
  • Implement observability including logging, metrics, tracing, and alerting
  • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve
  • Design CI/CD pipelines and promote automation best practices
  • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure
  • Translate ML prototypes and research code into scalable, maintainable systems

Requirements

  • A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
  • Experience supporting AI/ML teams or deploying ML systems in production
  • Experience with GPU workloads and scheduling
  • Advanced proficiency in Python including async programming and performance optimization
  • Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
  • Strong background in distributed systems and microservices architecture
  • Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
  • Proficiency in REST API development using FastAPI, Flask, or similar
  • Experience with containerization and CI/CD pipelines
  • Track record of operating production systems at scale
Benefits
  • Health insurance
  • Flexible working hours
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesPythonFastAPIasync programmingCI/CDcontainerizationdistributed systemsmicroservices architecturedata pipelinesperformance optimization
Soft Skills
troubleshootingsystem reliabilitycollaborationcommunicationproblem-solvingcontinuous improvementscalability designfault tolerancetask orchestrationresource optimization
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Computer SciencePhD in Computer Science