Flower Labs

Founding ML Engineer, Flower Frontier Model Team

Flower Labs

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇩🇪 Germany

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

Distributed SystemsDockerLinuxNode.jsPythonPyTorch

About the role

  • Join as a founding member of the Flower Frontier Model Team
  • Build category-defining models that blend existing practices with decentralized learning methods
  • Help build a reliable, maintainable and scalable software stack
  • Produce world-leading open-sourced models integrated into new Flower Lab products
  • Design, implement and optimize core components across data curation, evals, pre-training, post-training
  • Diagnose and resolve GPU/kernel issues, memory/storage bottlenecks, and multi-node failures at scale
  • Collaborate on the debugging of training instabilities and related issues
  • Devise surrounding infrastructure, tooling, monitoring, and observability

Requirements

  • Exceptional software engineering skills (Python, deep learning frameworks, testing, profiling, refactoring, reproducibility)
  • Expertise with modern ML training stacks: PyTorch, JAX or equivalent
  • Experience implementing model architectures from scratch and working within libraries like DeepSpeed, Megatron or equivalent
  • Ability to tune, debug, and profile large-scale training runs
  • Hands-on experience working with large GPU clusters, including job orchestration, scheduling, multi-node runs, NCCL/RDMA issues, and GPU performance optimization
  • Ability to collaborate effectively with both research-oriented and engineering-oriented colleagues
  • Good engineering hygiene: modular design, code reviews, documentation, reproducibility, versioning of data/models/configurations
  • Familiarity with common tools (Linux command line, git, Docker)
  • Openness to adopting new tooling
  • Solid understanding of distributed systems and networking
  • Strong written English
  • Open, honest and transparent communication skills
Benefits
  • Opportunity to work on frontier AI models
  • Potential for technical leadership
  • Collaborative start-up environment

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Pythondeep learning frameworkstestingprofilingrefactoringreproducibilityPyTorchJAXDeepSpeedMegatron
Soft skills
collaborationengineering hygienecommunicationopenness to adopting new toolingproblem-solving