Tech Stack
ApacheAWSAzureCloudDistributed SystemsDockerKubernetesPythonPyTorchRaySpark
About the role
- Support AI researchers by building scalable ML training pipelines and infrastructure for foundation model development
- Design efficient data processing workflows for large-scale design datasets and industry-specific file formats
- Optimize distributed training systems and develop solutions for model parallelism, checkpointing, and efficient resource management
- Analyze performance bottlenecks and provide solutions to scaling problems
- Implement and maintain robust, testable, well-documented code
- Collaborate with researchers and engineers on projects at the intersection of research and product
- Present results to collaborators and leadership
- Contribute to infrastructure that enables ML-powered product features for AEC (architecture, engineering, construction)
Requirements
- BSc or MSc in Computer Science or related field, or equivalent industry experience
- Experience with distributed systems for machine learning and deep learning at scale
- Strong knowledge of ML infrastructure and model parallelism techniques
- Experience with frameworks such as PyTorch, Lightning, Megatron, DeepSpeed, and FSDP
- Proficiency in Python and strong software engineering practices
- Experience with cloud services and architectures (AWS, Azure, etc.)
- Familiarity with version control, CI/CD, and deployment pipelines
- Excellent written documentation skills
- Preferred: Experience with AEC data formats (BIM models, IFC files, CAD files, Drawing Sets)
- Preferred: Knowledge of the AEC industry and its specific data processing challenges
- Preferred: Experience scaling ML training and data pipelines for large datasets
- Preferred: Experience with distributed data processing and ML infrastructure (Apache Spark, Ray, Docker, Kubernetes)
- Preferred: Experience with performance optimization, monitoring, and efficiency in large-scale ML systems
- Preferred: Experience with Autodesk or similar products (Revit, Sketchup, Forma)
- Ability to work effectively on a global, remote-first team; self-starter and adaptable in ambiguous environments