Salary
💰 $146,900 - $237,600 per year
Tech Stack
CloudPythonPyTorchRayTensorflow
About the role
- Provide technical leadership at the intersection of applied research and large-scale engineering within Autodesk’s AEC organization
- Collaborate with Research Scientists to translate novel model architectures and experimental ideas into robust, scalable implementations
- Develop, optimize, and deploy new ML models and AI techniques at scale
- Own the end-to-end training workflow: distributed training, debugging, and performance optimization
- Identify and apply best practices in large model training (e.g., parallelization, mixed precision, gradient checkpointing)
- Drive engineering efforts within a global team of scientists and engineers, ensuring reproducibility and efficiency of experiments
- Drive throughput by identifying bottlenecks in training pipelines and implementing improvements
- Partner with infrastructure and platform teams to leverage large-scale compute clusters and cloud services
- Take ownership of advancing foundation model research in the AEC domain, including experiment design, distributed training, optimization, and large-scale deployment
- Report to the Machine Learning Manager within the AEC Solutions organization
Requirements
- Master’s or PhD in a field related to AI/ML such as: Computer Science, Mathematics, Statistics, Physics, Computational Linguistics, Mechanical Engineering, or related disciplines
- Strong background in deep learning, including: Implementing custom architectures; Optimizing model performance; Developing novel loss functions; Deploying production-ready solutions
- Familiarity with transformer-based models across various data modalities
- Strong expertise in PyTorch (TensorFlow, JAX also valuable)
- Strong coding abilities in Python, with emphasis on debugging and performance profiling
- Hands-on experience with distributed training frameworks (e.g., PyTorch Distributed, DeepSpeed, Megatron-LM, FSDP, Horovod)
- Experience training foundation models on 2D, 3D, or multimodal data at scale (preferred)
- Expert-level knowledge of transformers, scaling laws, and distributed training (preferred)
- Demonstrated success in optimizing training or deployment pipelines for large models; familiarity with Ray, DeepSpeed, Megatron, Triton, CUDA, Metaflow/MLflow (preferred)
- Familiarity using compute clusters and cloud services for large-scale ML pipelines
- Experience with multi-GPU and large-scale training in HPC or cloud environments
- Contributions to PyTorch or large-scale ML frameworks (preferred)
- Significant post-graduate research experience, or 5+ years of industry experience (preferred)
- Knowledge of AEC-related data modalities (3D geometry, CAD/BIM models, construction text corpus) is a strong plus
- Strong builder mindset, detail-oriented, strong debugging instincts, and ability to thrive in fast-paced collaborative research environments