Tech Stack
AWSCloudGoogle Cloud PlatformKerasNumpyPandasPythonPyTorchScikit-LearnTensorflow
About the role
- Drive research and development of novel deep learning architectures, training paradigms (e.g., supervised, self-supervised, generative, multi-modal), and algorithms tailored for large-scale biological sequence data and related modalities.
- Partner with computational biologists, data scientists, and data engineers to integrate domain expertise, define scientifically meaningful tasks, and apply cutting-edge machine learning research towards ambitious biological challenges.
- Design, implement, and maintain robust ML Operations (MLOps) pipelines for model training, evaluation, versioning, and deployment using cloud-based infrastructure and tools like AWS's MLflow.
- Design and execute statistically rigorous experiments using design of experiments (DOE), A/B testing, and Bayesian approaches to optimize RNA editing strategies, validate model predictions, and advance our mechanistic understanding of RNA editing through testable biological hypotheses.
- Identify and prototype novel machine learning applications across diverse organizational functions including manufacturing optimization, supply chain analytics, regulatory strategy, and clinical trial design.
- Mentor early career scientists and engineers, fostering a culture of technical excellence and scientific curiosity through leadership and code review.
- Contribute to long-term strategic planning for ML/AI platform capabilities, identifying emerging technologies and research directions that could transform genetic medicine development timelines and outcomes.
- Share research findings through internal presentations and contribute to the scientific community via publications or presentations.
Requirements
- PhD (or equivalent expertise) with a strongly distinguished research focus in Machine Learning, Computer Science, Statistics, Physics, or related quantitative field with 4+ years post-graduate experience in leading industrial R&D or highly competitive academic environments.
- Deep understanding of modern deep learning theory and practice, including Transformers, sequence models (e.g., state-space models), LLMs, and proven ability to implement, train, and debug high-performance models using PyTorch, JAX, TensorFlow, or R frameworks, with experience in associated libraries such as Flax, Equinox, PyTorch Geometric, or tidymodels.
- Proficiency in scientific computing and data analysis using R (tidyverse, Bioconductor, caret) and/or Python (pandas, numpy, scipy, scikit-learn) ecosystems.
- Experience working with large datasets and understanding the challenges associated with scale, including data preprocessing, feature engineering, distributed training, and cloud platforms (AWS/SageMaker, GCP, Databricks).
- Experience with graph neural networks, molecular representation learning, or willingness to rapidly acquire computational biology expertise.
- Track record of impactful research through publications in high-impact scientific journals with experience leading technical projects and mentoring junior researchers.
- Excellent communication skills, capable of discussing complex ideas with both domain experts and audiences with diverse backgrounds, and experience with ensemble methods, cross-validation, and model evaluation in production environments.