Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Adobe

Machine Learning Engineer

Adobe

Machine Learning Data Engineer responsible for building distributed data frameworks at Adobe. Collaborating with ML teams to optimize data pipelines and contribute to large-scale model training.

Posted 5/12/2026full-timeSan Jose • California, Washington • 🇺🇸 United StatesMid-LevelSenior💰 $151,800 - $265,350 per yearWebsite

Tech Stack

Tools & technologies
ApacheAWSAzureCloudDistributed SystemsDockerPythonPyTorchRaySparkSQLTensorflow

About the role

Key responsibilities & impact
  • Contribute to building and maintaining distributed training data loaders that handle multi-source data ingestion, temporal sampling, and real-time transformations for large-scale model training workflows.
  • Help implement and maintain feature enrichment pipelines and dataset registry systems that support multimodal model training across images, video, documents, and text.
  • Build and maintain batch inference pipelines for large-scale feature extraction, processing assets through distributed GPU clusters with queue management and fault tolerance.
  • Develop data processing systems using frameworks like Apache Ray, Spark, DuckDB, or similar distributed computing tools for SQL-based data ingestion and Apache Arrow-based storage formats.
  • Support semantic search capabilities and vector database infrastructure (e.g., OpenSearch, LanceDB) for dataset discovery and embedding-based retrieval.
  • Contribute to CI/CD infrastructure for ML systems including self-hosted runner management, Docker image builds, automated testing pipelines, and deployment automation.
  • Collaborate with ML research teams to translate training requirements into reliable, scalable data loading and preprocessing infrastructure.
  • Write reusable framework components, SDKs, and documentation to help accelerate platform adoption across modeling teams.
  • Optimize data pipeline performance across dimensions like startup latency, throughput, memory footprint, and GPU utilization.
  • Contribute to observability and reliability standards for production data systems supporting 24/7 training workloads.

Requirements

What you’ll need
  • 3–4 years of professional experience building and operating distributed systems or data infrastructure in production environments.
  • Solid understanding of distributed computing concepts and experience with frameworks like Apache Spark, Ray, Dask, or equivalent.
  • Familiarity with cloud platforms (AWS or Azure) and data platforms such as Databricks or Spark.
  • Proficiency in Python and strong software engineering fundamentals — system design, data structures, algorithms.
  • Familiarity with ML frameworks such as PyTorch or TensorFlow; hands-on ML experience is a plus but not required.
  • Basic familiarity with MLOps practices including CI/CD pipelines, containerization (Docker), and deployment automation.
  • Bachelor’s degree in Computer Science, Engineering, or a related field; MS is a plus.
  • Strong communication skills and ability to collaborate across engineering and research teams.

Benefits

Comp & perks
  • Comprehensive benefits programs

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
distributed systemsdata infrastructuredistributed computingApache SparkApache RayPythonMLOpsCI/CDDockerfeature extraction
Soft Skills
communicationcollaborationproblem-solvingdocumentation
Certifications
Bachelor's degree in Computer ScienceBachelor's degree in EngineeringMaster's degree in Computer ScienceMaster's degree in Engineering