Data Engineer

LUKA GLOBAL

Data Engineer developing scalable data solutions for a Digital Health Start-Up. Focusing on data management, processing, and integration of ML models to improve patient care and health management.

Posted 6/9/2026full-timeRemote • 🇩🇪 GermanyMid-LevelSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

data managementdata processing architecturesdata acquisitiondata aggregationdata governancedata lifecycleETLML Opsdata cleaningdata curation

Soft Skills

effective communicationinterdisciplinary collaborationagile developmentproblem-solvingteamworkorganizational skillsleadershipadaptabilitycritical thinkingattention to detail

Tools & Technologies

MySQLPostgreSQLMongoDBRedisAWSDockerApache AirflowTerraformKerasTensorFlow

Certifications & Qualifications

B.Sc.B.Eng.

Industry Keywords

data engineeringstructured datasemi-structured dataunstructured dataOLTPOLAPtime-series datadata pipelinescloud deploymentsmachine learning

Tech Stack

Tools & technologies

AirflowApacheAWSCloudDockerEC2ETLKerasMongoDBMySQLNumpyPandasPostgresPythonPyTorchRedisTensorflowTerraform

About the role

Key responsibilities & impact

Develop scalable data management and data processing architectures.
Manage data acquisition from API, batch, event or streaming sources.
Develop processes for data aggregation.
Design and develop data pre- and post-processing stages.
Plan and design for data governance, security, provenance and the over-all data lifecycle.
Leverage best-in-class cloud technologies to cater for OLTP and OLAP business needs.
Integrate ML models and Analytic components into the workflows (including MLOps).
Work closely with Data Science and Application Development teams in an agile development process.

Requirements

What you’ll need

B.Sc., B.Eng. or higher in Computer Science, Computer / Electronic / Systems Engineering, or similar disciplines.
Proven experience as a Data Engineer
Experienced with structured, semi-structured and unstructured data (e.g., Relational, JSON, Schema-less).
Experience with creating, cleaning and curating datasets and databases such as: MySQL, PostgreSQL, MongoDB, Redis, Bigtable, time-series databases or similar.
Serverless/distributed processing experience, e.g., Multiprocessing, containers, lambda or similar.
Know-how for scheduling workflows, e.g., DAGs with Apache Airflow.
Accomplished and versed with various ETL approaches.
Exposure to classical and deep learning-based ML methods (e.g., CNNs, DL Auto-encoders, etc.)
Knowledge and experience of relevant data, analytics, visualization and ML languages and libraries is important (e.g., Julia/Python, Boto3/Apache Airflow, Parquet, SciPy/NumPy, Pandas/Matplotlib, Keras/TensorFlow, PyTorch, etc.).
Experience with Model Deployment / ML Ops is desirable.
Edge-based inference is also of interest.
Experience with AWS (Fargate, RDS, EC2, SageMaker, Timestream, EMR, Kinesis, MWAA, etc.), Docker, IaC (Terraform), CI/CD, monitoring and related tooling.
Experience with Time-Series Data is a bonus.
Communicating effectively in an interdisciplinary environment (AI/ML, product management, regulatory, clinical).
Have practical experience with ETL, Data Pipelines and Cloud Deployments.
Experience in design and building data solutions while ensuring confidentiality, integrity, and availability.
A strong engineering interest in ML and data science.
Business proficient in English (spoken and written).

Benefits

Comp & perks

The role offers a competitive salary
chance to be a central player in the future of healthcare