
Founding ML Engineer, Flower Frontier Model Team
Flower Labs
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇩🇪 Germany
Visit company websiteJob Level
Senior
Tech Stack
Distributed SystemsDockerLinuxNode.jsPythonPyTorch
About the role
- Join as a founding member of the Flower Frontier Model Team
- Build category-defining models that blend existing practices with decentralized learning methods
- Help build a reliable, maintainable and scalable software stack
- Produce world-leading open-sourced models integrated into new Flower Lab products
- Design, implement and optimize core components across data curation, evals, pre-training, post-training
- Diagnose and resolve GPU/kernel issues, memory/storage bottlenecks, and multi-node failures at scale
- Collaborate on the debugging of training instabilities and related issues
- Devise surrounding infrastructure, tooling, monitoring, and observability
Requirements
- Exceptional software engineering skills (Python, deep learning frameworks, testing, profiling, refactoring, reproducibility)
- Expertise with modern ML training stacks: PyTorch, JAX or equivalent
- Experience implementing model architectures from scratch and working within libraries like DeepSpeed, Megatron or equivalent
- Ability to tune, debug, and profile large-scale training runs
- Hands-on experience working with large GPU clusters, including job orchestration, scheduling, multi-node runs, NCCL/RDMA issues, and GPU performance optimization
- Ability to collaborate effectively with both research-oriented and engineering-oriented colleagues
- Good engineering hygiene: modular design, code reviews, documentation, reproducibility, versioning of data/models/configurations
- Familiarity with common tools (Linux command line, git, Docker)
- Openness to adopting new tooling
- Solid understanding of distributed systems and networking
- Strong written English
- Open, honest and transparent communication skills
Benefits
- Opportunity to work on frontier AI models
- Potential for technical leadership
- Collaborative start-up environment
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Pythondeep learning frameworkstestingprofilingrefactoringreproducibilityPyTorchJAXDeepSpeedMegatron
Soft skills
collaborationengineering hygienecommunicationopenness to adopting new toolingproblem-solving