CAI

Spark Engineer

CAI

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇵🇭 Philippines

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowAmazon RedshiftApacheAWSAzureBigQueryCassandraCloudDockerGoogle Cloud PlatformHadoopHDFSJavaKafkaKubernetesPySparkPythonScalaSparkSQLYarn

About the role

  • Design, build, and optimize large-scale data processing systems using Apache Spark (Batch and Streaming)
  • Collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions
  • Design, develop, and maintain big data solutions using Apache Spark
  • Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources
  • Optimize Spark jobs for performance and scalability across large datasets
  • Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.)
  • Implement data quality checks, monitoring, and alerting for Spark-based workflows
  • Ensure security and compliance of data processing systems
  • Troubleshoot and resolve data pipeline and Spark job issues in production environments

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred)
  • 3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming)
  • Strong programming skills in Scala, Java, or Python (PySpark)
  • Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS)
  • Experience with data serialization formats (Parquet, ORC, Avro)
  • Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse)
  • Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus)
  • Strong background in performance tuning and Spark job optimization
  • Experience with CI/CD pipelines and version control (Git)
  • Familiarity with containerization (Docker, Kubernetes) is an advantage
  • Preferred: Experience with stream processing frameworks (Kafka, Flink)
  • Preferred: Exposure to machine learning workflows with Spark MLlib
  • Preferred: Knowledge of workflow orchestration tools (Airflow, Luigi)
  • Ability to safely and successfully perform the essential job functions (sedentary work)
  • Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor
Benefits
  • Remote work
  • Reasonable accommodation for applicants (application.accommodations@cai.io)

ATS Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Apache SparkScalaJavaPythonHadoopYARNHDFSParquetORCAvro
Soft skills
collaborationtroubleshootingproblem-solvingperformance tuningdata quality checks
Certifications
Bachelor’s degree in Computer ScienceMaster’s degree in Computer Science
Software Mind

Data Engineer

Software Mind
Mid · Seniorfull-time🇵🇱 Poland
Posted: 15 days agoSource: jobs.smartrecruiters.com
AirflowAmazon RedshiftApacheAWSAzureBigQueryCloudETLGoogle Cloud PlatformHadoopJavaKafka+5 more
IQVIA

Senior Data Engineer

IQVIA
Seniorfull-time$91k–$228k / year🇺🇸 United States
Posted: 37 days agoSource: iqvia.wd1.myworkdayjobs.com
AirflowAmazon RedshiftApacheAWSAzureBigQueryCloudDistributed SystemsETLGoogle Cloud PlatformJavaKafka+5 more
Way

Senior Data Engineer

Way
Seniorfull-timeTexas · 🇺🇸 United States
Posted: 21 hours agoSource: jobs.lever.co
AirflowAmazon RedshiftApacheAWSAzureBigQueryCloudDockerETLGoogle Cloud PlatformKafkaKubernetes+3 more
ELEKS

Data Engineer

ELEKS
Mid · Seniorfull-time$98k–$140k / year🇨🇦 Canada
Posted: 1 day agoSource: jobs.lever.co
AirflowAmazon RedshiftAWSAzureBigQueryCloudETLGoogle Cloud PlatformPythonScalaSparkSQL
Keyrus

Data Engineer

Keyrus
Mid · Seniorfull-time🇫🇷 France
Posted: 4 days agoSource: jobs.keyrus.fr
AirflowAmazon RedshiftAWSAzureBigQueryCloudETLGoogle Cloud PlatformJavaKafkaMySQLPostgres+4 more