
Forward Deployed Data Engineer
Ellison Institute of Technology Oxford
full-time
Posted on:
Location Type: Hybrid
Location: Oxford • United Kingdom
Visit company websiteExplore more
About the role
- Partner with scientists and engineers to deliver robust, reproducible data pipelines that meet research needs across disciplines.
- Own ingestion, storage, curation, and transformation of multimodal data, including text, images, structured/tabular data, and high I/O formats such as Arrow/NetCDF/HDF5.
- Package and deploy code in research environments using containers (e.g. Docker).
- Scale processing across distributed cloud warehouses/storage via container orchestration (e.g. Kubernetes), distributed compute frameworks (e.g. Spark, Ray), or High-Performance Compute (e.g. Slurm).
- Work with sensitive data in line with security and compliance requirements (audit trails, encryption, GDPR, RBAC/ABAC).
- Contribute to an engineering culture that values maintainability, testing, robust system design, and deep collaboration, but allows flexibility for rapid prototyping and responsiveness to changing landscapes.
- Disperse data engineering best practices into the research and applied teams, ensuring datasets used for AI models are well-structured and ready for training.
Requirements
- **The Role:**
- Forward Deployed Data Engineers are a key interface between our core data systems and our research and product engineering projects. As an early member of our rapidly growing team, you’ll work side-by-side with scientists and engineers in bioinformatics, healthcare, robotics, agriculture, and frontier AI. You will disseminate data engineering best practices into the research and applied teams, ensuring that the datasets used for our AI models are accurate, reproducible, versioned-controlled, well-structured, and ready for training. You will help scientists turn raw information from various sources into high-quality resources, ensuring our technical foundations support the next generation of discovery.
- This is a hands-on role for engineers who thrive on collaborating directly with researchers, solving problems quickly, and turning complex business logic into scalable, reliable data pipelines while contributing to the broader EIT platform. Successful candidates will be clear, respectful communicators who are comfortable bringing their own expertise into diverse groups.
- **Day-to-Day, You Might:**
- - Partner with scientists and engineers to deliver robust, reproducible data pipelines that meet research needs across disciplines.
- - Own ingestion, storage, curation, and transformation of multimodal data, including text, images, structured/tabular data, and high I/O formats such as Arrow/NetCDF/HDF5.
- - Package and deploy code in research environments using containers (e.g. Docker).
- - Scale processing across distributed cloud warehouses/storage via container orchestration (e.g. Kubernetes), distributed compute frameworks (e.g. Spark, Ray), or High-Performance Compute (e.g. Slurm).
- - Work with sensitive data in line with security and compliance requirements (audit trails, encryption, GDPR, RBAC/ABAC).
- - Contribute to an engineering culture that values maintainability, testing, robust system design, and deep collaboration, but allows flexibility for rapid prototyping and responsiveness to changing landscapes.
- **What Makes You a Great Fit:**
- - You have strong programming experience in Python and SQL, and value code quality, reliability (including testing, CI/CD) and observability as much as performance
- - You have experience designing, deploying, and optimising distributed data systems or data-intensive backend services.
- - You think in terms of systems and longevity, not just one-off ETL scripts, and embrace end-to-end ownership from low-level performance to user interfaces.
- - You’re a collaborative partner to Infrastructure/Ops teams and researchers; clear, respectful communicator.
- - You have a low-ego, team-first mindset and help grow our engineering culture by mentoring, sharing, and elevating the work of those around you.
- Our Forward Deployed Engineers will have the opportunity to work on diverse projects, and to expand their skillset, but will add particular value when they can match their data engineering expertise with domain knowledge relevant to a project. **Does one of these domains fit you?**
- **Health/Clinical Data Engineering:**
- - In-depth knowledge and expertise in human healthcare data, clinical data, patient journey and/or biocuration
- - Feature engineering for predictive models utilising health data
- - Transformation of clinical data into common data models, and design of new models
- - Clinical data quality control and analytics
- - Differential privacy systems, PII-handling, and anonymisation/pseudonymisation
- OR
- **Bioinformatics/Genomics Data Engineering:**
- - Experience in bioinformatics and use of industry-standard tooling such as NextFlow or similar
- - Genomic/metagenomic data processing e.g. genome assembly and annotation
- - Comfortable with relevant data formats such as fastq/fasta, vcf etc
- - Conversion of genomic data into ML-ready formats
Benefits
- **We offer the following salary and benefits:**
- Enhanced holiday pay
- Pension
- Life Assurance
- Income Protection
- Private Medical Insurance
- Hospital Cash Plan
- Therapy Services
- Perk Box
- Electric Car Scheme
- --
- **Why work for EIT:**
- At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonSQLdata pipelinesdata ingestiondata curationdata transformationdistributed data systemsETLfeature engineeringdata quality control
Soft Skills
collaborationcommunicationmentoringteamworkproblem-solvingflexibilityrespectfulnesscode qualityreliabilityobservability