Tech Stack
AWSCloudETLHadoopJavaNoSQLPythonScalaSpark
About the role
- Design and implement processes to ingest data from various sources into the Databricks Lakehouse platform on AWS
- Develop, maintain, and optimize data models and ETL pipelines supporting the Medallion Architecture (Bronze, Silver, Gold)
- Utilize Databricks and Delta Lake to integrate, consolidate, and cleanse data for analysis
- Ensure availability, reliability, scalability, and versioned data management of data systems
- Collaborate with data scientists, analysts, business stakeholders, and clients to understand requirements and deliver solutions
- Engage with clients to support data engineering requirements and provide guidance on best practices
- Stay updated on industry trends and emerging technologies in data engineering
Requirements
- +90% English written and oral (at least B2 level) with excellent communication skills
- Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field
- Minimum of 5+ years of experience in data engineering or a related data role
- Proven experience in designing and implementing production-grade Spark-based solutions
- Proficient in query tuning, performance tuning, troubleshooting, and debugging Spark or other big data solutions
- Familiarity with big data technologies such as Spark/Delta, Hadoop, NoSQL, MPP, and OLAP
- Experience with cloud architecture, systems, and principles, particularly in AWS
- Proficient in programming languages such as Python, R, Scala, or Java
- Expertise in scaling ETL pipelines for performance and cost-effectiveness
- Cloud certification is highly desirable
- Strong problem-solving and troubleshooting skills
- Strong communicator, in both verbal and written form
- Neutral toward technology, vendor, and product choices
- Unflappable in the face of opposition to ideas