Spark Engineer

CAI

full-time

Posted on: 9/28/2025

Location Type: Remote

Location: Remote • 🇵🇭 Philippines

✨ AI Apply

Mid-LevelSenior

AirflowAmazon RedshiftApacheAWSAzureBigQueryCassandraCloudDockerGoogle Cloud PlatformHadoopHDFSJavaKafkaKubernetesPySparkPythonScalaSparkSQLYarn

About the role

Design, build, and optimize large-scale data processing systems using Apache Spark (Batch and Streaming)
Collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions
Design, develop, and maintain big data solutions using Apache Spark
Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources
Optimize Spark jobs for performance and scalability across large datasets
Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.)
Implement data quality checks, monitoring, and alerting for Spark-based workflows
Ensure security and compliance of data processing systems
Troubleshoot and resolve data pipeline and Spark job issues in production environments

Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred)
3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming)
Strong programming skills in Scala, Java, or Python (PySpark)
Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS)
Experience with data serialization formats (Parquet, ORC, Avro)
Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse)
Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus)
Strong background in performance tuning and Spark job optimization
Experience with CI/CD pipelines and version control (Git)
Familiarity with containerization (Docker, Kubernetes) is an advantage
Preferred: Experience with stream processing frameworks (Kafka, Flink)
Preferred: Exposure to machine learning workflows with Spark MLlib
Preferred: Knowledge of workflow orchestration tools (Airflow, Luigi)
Ability to safely and successfully perform the essential job functions (sedentary work)
Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor

Benefits

Tip: use these terms in your resume and cover letter to boost ATS matches.

Apache SparkScalaJavaPythonHadoopYARNHDFSParquetORCAvro

collaborationtroubleshootingproblem-solvingperformance tuningdata quality checks

Bachelor’s degree in Computer ScienceMaster’s degree in Computer Science