
ML Infrastructure Engineer
BlueSpace.ai
full-time
Posted on:
Location Type: Hybrid
Location: Oakland • California • United States
Visit company websiteExplore more
Salary
💰 $150,000 - $200,000 per year
About the role
- Take ownership of the infrastructure in support of developing and deploying Machine Learning models for Autonomous Vehicles
- Architect and deploy cloud and on-prem ML training and evaluation infrastructure
- Own the data management pipelines, from ingestion and storage, to model training and evaluation that span vehicle compute, cloud, and on-prem
- Change model training code to take advantage of the better data storage techniques and formats you propose
- Evaluate and implement methods, software, and hardware for model deployment onto the test and production vehicles
- Develop systems and processes to improve transition of models from research to production while balancing cost
- Participate in model design, research and set requirements to model design that ensure their successful deployment
- Own and deliver projects end-to-end
- Optional: be able to hire, manage, or at least mentor other engineers who join this project when growth is needed
Requirements
- Experience in architecting and implementing data engineering solutions for a small engineering team / product (1-20 ppl)
- 2+ years of software engineering experience in any of the following: ML Infrastructure, Data Engineering, Platform Engineering, Distributed Systems
- Either existing experience with ML Infrastructure as described below, or strong expertise in non-ML Data infrastructure combined with a strong desire to learn ML Infra specifics
- Production ML experience with at least one of the following - (1). Model conversion and optimization for production (ONNX, TensorRT), (2) Model deployment on specialized hardware (e.g. Jetson), or (3) Model monitoring and MLOps
- Ability to programmatically access cloud services using Python, NodeJS, or equivalent
- Knowledge of or experience with data management solutions, such as - (1) Workflow orchestration pipelines (e.g. Argo, Airflow, Kubernetes) or (2) Managed large-scale data processing systems (e.g. Spark, Dataproc, Databricks)
- End-to-end ML pipelines (e.g. SageMaker, Vertex)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Machine LearningData EngineeringML InfrastructureModel conversionModel optimizationModel deploymentMLOpsCloud servicesPythonNodeJS
Soft Skills
OwnershipProject managementMentoringCollaborationProblem-solving