Prime Intellect

Research Engineer – Distributed Training

Prime Intellect

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Lead and participate in novel research to build a massive scale, highly reliable and secure decentralized training orchestration solution
  • Optimize the performance, cost, and resource utilization of AI workloads by leveraging the most recent advances for compute & memory optimization techniques.
  • Contribute to the development of our open-source libraries and frameworks for distributed model training.
  • Publish research in top-tier AI conferences such as ICML & NeurIPS.
  • Distill highly technical project outcomes in layman approachable technical blogs to our customers and developers.
  • Stay up-to-date with the latest advancements in AI/ML infrastructure and tools, decentralized training research and proactively identify opportunities to enhance our platform's capabilities and user experience.

Requirements

  • Strong background in AI/ML engineering, with extensive experience in designing and implementing end-to-end pipelines for training and deploying large-scale AI models.
  • Deep expertise in distributed training techniques, frameworks (e.g., PyTorch Distributed, DeepSpeed, MosaicML’s LLM Foundry), and tools (e.g. Ray) for optimizing the performance and scalability of AI workloads.
  • Experience in large-scale model training incl. distributed training techniques such as data, tensor & pipeline parallelism
  • Solid understanding of MLOps best practices, including model versioning, experiment tracking, and continuous integration/deployment (CI/CD) pipelines.
  • Passion for advancing the state-of-the-art in decentralized AI model training and democratizing access to AI capabilities for researchers, developers, and businesses worldwide.
  • If you're not familiar with these, but feel like that you can contribute to our mission and you're a high-energy person, get familiar with these resources (here, here and here) and please reach out!
Benefits
  • Competitive compensation, including equity incentives, aligning your success with the growth and impact of Prime Intellect.
  • Flexible work arrangements, with the option to work remotely or in-person at our offices in San Francisco.
  • Visa sponsorship and relocation assistance for international candidates.
  • Quarterly team off-sites, hackathons, conferences and learning opportunities.
  • Opportunity to work with a talented, hard-working and mission-driven team, united by a shared passion for leveraging technology to accelerate science and AI.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI engineeringML engineeringend-to-end pipelinesdistributed training techniquesdata parallelismtensor parallelismpipeline parallelismMLOps best practicesmodel versioningcontinuous integration/deployment
Soft Skills
leadershipcommunicationtechnical writingcollaborationproblem-solvingadaptabilitypassion for innovationcustomer engagementresearch publicationproactive identification of opportunities