
Junior Research Infrastructure Engineer
MeshyAI
full-time
Posted on:
Location Type: Hybrid
Location: Sunnyvale • California • United States
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Participate in the design and implementation of distributed task orchestration systems using Temporal or Celery.
- Architect pipelines across cloud object storage (S3, GCS), data lakes, and metadata catalogs.
- Implement partitioning, sharding, and caching strategies to ensure data processing pipelines are resilient, highly available, and consistent.
- Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries).
- Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
- Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
- Implement validation and quality checks to ensure datasets meet ML training requirements.
- Collaborate with ML researchers to quickly adapt pipelines to evolving pretraining and evaluation needs.
- Use infrastructure-as-code (Terraform, Kubernetes, etc.) to manage scalable and reproducible environments.
- Manage data assets using Databricks Asset Bundles (DABs) and build rigorous CI/CD pipelines (GitHub Actions).
- Focus on maximizing cluster utilization (CPU/Memory) and optimizing EC2 instance allocation to aggressively reduce compute costs.
- Take ownership of the platform’s "Interface" by building Data Explorers and management consoles using React or Next.js.
- Actively listen to researchers and data scientists to iterate on UI/UX based on their feedback.
- Simplify complex CLI operations into intuitive GUI interactions to boost overall developer experience (DevEx).
Requirements
- 2+ years of experience in software engineering, backend development, or distributed systems.
- Strong programming skills in Python (plus Scala/Java/C++ a plus).
- Familiarity with distributed frameworks (Spark, Dask, Ray) and cloud platforms (AWS/GCP/Azure).
- Experience with workflow orchestration tools (Temporal, Celery, or Airflow).
- Proficiency with Infrastructure as Code (Terraform) and CI/CD tools (GitHub Actions).
- Experience building web applications or internal tools using React or Next.js.
- A "product-first" mindset: an interest in how users interact with infrastructure and a desire to build clean, functional interfaces.
- Experience handling large-scale unstructured datasets (images, video, binaries, or 3D/2D assets).
- Familiarity with AI/ML training data pipelines, including dataset versioning, augmentation, and sharding.
- Exposure to computer graphics or 3D/2D data processing.
Benefits
- Stock options available for core team members.
- 401(k) plan for employees.
- Comprehensive health, dental, and vision insurance.
- The latest and best office equipment.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonScalaJavaC++TemporalCeleryTerraformGitHub ActionsReactNext.js
Soft Skills
collaborationactive listeningproduct-first mindsetuser interaction focusiterative design