Infrastructure Engineer

• Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries).
• Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
• Architect pipelines across cloud object storage (S3, GCS, Azure Blob), data lakes, and metadata catalogs.
• Optimize large-scale processing with distributed frameworks (Spark, Dask, Ray, Flink, or equivalents).
• Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
• Maintain data lineage, reproducibility, and governance for datasets used in AI/ML pipelines.

Data Infrastructure Engineer

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills