Senior Research Infrastructure Engineer

MeshyAI

full-time

Posted on: 2/25/2026

Location Type: Hybrid

Location: Sunnyvale • California • United States

Visit company website

Explore more

Infrastructure Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Airflow AWS Azure Cloud Distributed Systems ETL Google Cloud Platform Java Kubernetes Python Ray Scala Spark SQL Terraform

About the role

Architect pipelines across cloud object storage (S3, GCS, Azure Blob), data lakes, and metadata catalogs.
Optimize large-scale processing with distributed frameworks (Spark, Dask, Ray, Flink, or equivalents).
Implement partitioning, sharding, caching strategies, and observability (monitoring, logging, alerting) for reliable pipelines.
Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries).
Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
Implement validation and quality checks to ensure datasets meet ML training requirements.
Collaborate with ML researchers to quickly adapt pipelines to evolving pretraining and evaluation needs.
Use infrastructure-as-code (Terraform, Kubernetes, etc.) to manage scalable and reproducible environments.
Integrate CI/CD best practices for data workflows.
Maintain data lineage, reproducibility, and governance for datasets used in AI/ML pipelines.
Work cross-functionally with ML researchers, graphics/vision engineers, and platform teams.
Embrace versatility: switch between infrastructure-level challenges and asset/data-level problem solving.
Contribute to a culture of fast iteration, pragmatic trade-offs, and collaborative ownership.

Requirements

5+ years of experience in data engineering, distributed systems, or similar.
Strong programming skills in Python (plus Scala/Java/C++ a plus).
Solid skills in SQL for analytics, transformations, and warehouse/lakehouse integration.
Proficiency with distributed frameworks (Spark, Dask, Ray, Flink).
Familiarity with cloud platforms (AWS/GCP/Azure) and storage systems (S3, Parquet, Delta Lake, etc.).
Experience with workflow orchestration tools (Airflow, Prefect, Dagster).

Benefits

Stock options available for core team members.
Comprehensive health, dental, and vision insurance.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonScalaJavaC++SQLSparkDaskRayFlinkETL

Soft Skills

collaborationadaptabilityproblem solvingiterationownership