Conduct pre-training AI models on large, distributed servers equipped with thousands of NVIDIA GPUs.
Design, prototype, and scale innovative architectures to enhance model intelligence.
Independently and collaboratively execute experiments, analyze results, and refine methodologies for optimal performance.
Investigate, debug, and improve both model efficiency and computational performance.
Contribute to the advancement of training systems to ensure seamless scalability and efficiency on target platforms.

Requirements

A degree in Computer Science or related field.
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Hands-on experience contributing to large-scale LLM training runs on large, distributed servers equipped with thousands of NVIDIA GPUs, ensuring scalability and impactful advancements in model performance.
Familiarity and practical experience with large-scale, distributed training frameworks, libraries and tools.
Deep knowledge of state-of-the-art transformer and non-transformer modifications aimed at enhancing intelligence, efficiency and scalability.
Strong expertise in PyTorch and Hugging Face libraries with practical experience in model development, continual pretraining, and deployment.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AI modelslarge-scale LLM trainingmodel efficiencycomputational performancetransformer modificationsnon-transformer modificationsmodel developmentcontinual pretrainingdeploymentNLP

Soft Skills

collaborative executionindependent executionanalytical skillsproblem-solvingmethodology refinement

Certifications

PhD in NLPPhD in Machine Learning