Contribute to a robust omics data ecosystem: build single cell data ingestion pipelines, select data formats, standards, and database schemas, and write validation tools, QC approaches, and analysis pipelines.
Collaborate with ML engineers, AI Researchers, and Data Engineers to iteratively evaluate, refine and grow datasets to maximize model performance.
Discover and define new data generation opportunities, and manage the delivery of those data products to our AI team.
Collaborate with engineers, product managers, UX designers, and other data scientists to publish valuable datasets as part of CZI's open data ecosystem.
Requirements
5+ years of experience with large scale genomic and/or epigenomic datasets.
Demonstrated delivery of large biological data products.
Experience with big data: extraction, transport, loading, databases, standardization, validation, QC, and analysis.
Experience with processing and orchestration pipelines, such as Argo Workflows, Databricks.
Strong fundamentals in statistical reasoning and machine learning.
Experience with biological data analysis and QC best practices.
Excellent written and verbal communication skills.
Enthusiasm to ramp up on technologies and learn new domains.
Experience working in a multidisciplinary environment (engineering, product, AI Research).
Benefits
CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
Paid time off to volunteer at an organization of your choice.
Funding for select family-forming benefits.
Relocation support for employees who need assistance moving to the Bay Area.
And more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
genomic datasetsepigenomic datasetsdata ingestion pipelinesdata validation toolsQC approachesanalysis pipelinesbig data extractiondata standardizationbiological data analysisstatistical reasoning
Soft skills
communication skillscollaborationenthusiasm for learningmultidisciplinary teamwork