Design and implement AI platforms enabling scalable, secure access to LLMs from multiple model providers.
Design and implement agentic workflows, tool ecosystems, and LLM prompt management solutions.
Design, build, and optimize scalable model training, fine tuning, and inference pipelines integrated with production systems.
Influence technical strategy for embedding stores, vector databases, and reusable assets.
Lead initiatives to streamline ML/AI workflows, improve operational efficiency, and establish standardized procedures.
Design and develop backend services and RESTful APIs using Python and FastAPI integrated with ML pipelines.
Take operational responsibility for team-owned services including monitoring, optimization, troubleshooting, and on-call rotation.
Collaborate with data and applied scientists, software engineers, product managers, and stakeholders to deliver ML-driven products.
Coach and mentor ML engineers, promoting collaboration and engineering excellence.
Build, deploy, and maintain ML infrastructure using AWS, Databricks, Docker, Kubernetes, Terraform, Snowflake, Coralogix, and GitHub.
Requirements
9+ years of hands-on experience in machine learning engineering, AI development, software engineering, or related fields, emphasizing secure, large-scale, distributed system design, AI/ML pipeline development, and implementation.
Extensive experience designing, developing, and operating scalable backend systems using domain-driven design, event-driven architectures, and microservices.
Deep expertise in agentic workflows, AI evaluation solutions, prompt management, and secure AI development and testing practices.
Strong knowledge of relational and document-based databases, data storage paradigms, and efficient RESTful API design.
Experience establishing robust CI/CD pipelines, automated testing (unit and integration), and deployment practices.
Strong leadership skills, including planning and management of complex projects and mentoring team members.
Excellent communication skills for technical and non-technical stakeholders.
Bachelor's degree in Computer Science, Software Engineering, or related field (preferred).
Proficiency in Python and cloud infrastructure provisioning via code (AWS preferred).
Experience with Python, FastAPI, AWS, Databricks, Docker, Kubernetes, Terraform, Snowflake, Coralogix, and GitHub.
Willingness to participate in on-call rotation and operational responsibility for production services.
Candidates must be located in the San Francisco Bay Area; in-office requirement 3x a week.
Benefits
Competitive pay
Comprehensive healthcare benefits
Equity
Financial assistance for hybrid work
Family planning assistance
Generous parental leave
Flexible time-off policies
Mental health and wellness resources
Learning, development, and recognition programs
Diversity, equity, and inclusion initiatives and employee resource groups
Volunteering and Gives Back programs
Reasonable accommodation for hiring process
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
machine learning engineeringAI developmentsoftware engineeringbackend systems designAI/ML pipeline developmentRESTful API designCI/CD pipelinesautomated testingdomain-driven designevent-driven architectures