Tech Stack
ApacheAWSAzureCloudETLGoogle Cloud PlatformJavaKafkaNoSQLPythonScalaSparkSQLUnity
About the role
- Design and implement scalable data pipelines
- Optimize existing data workflows for performance and reliability
- Collaborate with data scientists and analysts to support their data needs
- Implement data quality checks and monitoring
- Maintain documentation for data processes and architectures
Requirements
- Bachelor's degree in Computer Science, Engineering, or related technical field
- 5+ years of software development experience
- Strong proficiency in Python, Java or Scala programming languages
- Extensive hands-on experience with Apache Spark, including: Spark SQL, Spark Streaming and its data sources, KAFKA, CDC feed, etc.
- Performance optimization and tuning
- Data transformation and processing
- Hands on Experience working with cloud-based Spark platforms (Databricks, AWS EMR, AWS Glue)
- Strong understanding of Hive, Unity catalog, Glue catalog
- Hands on Experience working with any data quality framework/tools
- Hands on Experience working with Observability and monitoring for data processing pipelines
- Strong understanding of distributed computing concepts
- Proficiency in version control systems (Github/GitLab/Bitbucket)
- Experience building and maintaining high-volume data processing pipelines
- Knowledge of data modeling and ETL / ELT best practices
- Familiarity with SQL and NoSQL databases
- Understanding of data warehouse concepts and dimensional modeling
- Experience with real-time data processing and streaming architectures (preferred)
- Knowledge of Delta Lake or similar data lakehouse technologies (preferred)
- Experience with CI/CD pipelines (preferred)
- Cloud platform expertise (AWS/Azure/GCP) (preferred)
- Contributions to open-source projects (preferred).
- This position will require the ability to obtain a Public Trust Clearance. To obtain this clearance, you must be a US Citizen or Green Card Holder.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonJavaScalaApache SparkSpark SQLSpark StreamingKAFKAdata quality frameworksdata transformationETL