Malaria No More

Data Engineer

Malaria No More

contract

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

SeniorLead

Tech Stack

AirflowAnsibleAWSAzureCloudCyber SecurityDockerETLGoogle Cloud PlatformGraphQLKafkaKubernetesMicroservicesPythonPyTorchSparkSQLTensorflowTerraform

About the role

  • The Institute for Health Modeling and Climate Solutions (IMACS) is a global center of excellence, hosted by Malaria No More, with the mission to empower the world’s most climate-vulnerable countries with the tools, data, and expertise needed to predict, prevent, and respond to climate-sensitive health threats. IMACS is redefining how climate intelligence is operationalized in public health by building and scaling AI-powered digital public goods that integrate and model climate and health data. Through the application of machine learning, interoperable platforms, and next-generation early warning systems, IMACS enables real-time risk detection and proactive responses at scale. IMACS supports countries through co-designed implementation pathways– orchestrating data cooperation, strengthening national health and climate information systems with tailored innovations, training frontline actors and policymakers, and institutionalizing their use through clear SOPs and sustainability guidelines. By unlocking the value of climate and health data, IMACS helps transform fragmented information into strategic, actionable knowledge– enabling smarter decisions, better preparedness, and more resilient health systems in the era of climate disruption. Backed by the Patrick J. McGovern Foundation, we are building a Central Data & Analytics Hub (CDAH) to advance IMACS’ climate health AI foundation model and related digital public goods, as well as a training program, to equip public health professionals with the knowledge and tools required to make data-informed decisions at the intersection of climate and health.

Requirements

  • Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data. Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training. Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls. Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs. Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback. Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses. 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines. Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP. Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures. Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization. Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight.