Tech Stack
Amazon RedshiftAWSAzureCassandraCloudPostgresPythonSparkSQL
About the role
- Build the infrastructure and tooling required for optimal extraction, transformation, and loading of data from a wide variety of data sources using cloud-native big data services from AWS/Azure
- Deploy analytics tools that utilize the data pipeline to provide actionable insights into customer usage, operational efficiency and other analytical and business performance metrics
- Work with stakeholders including business analysts and data scientists to assist with data-related technical issues and support their data infrastructure needs
- Prepare data for modeling and analytics and enable effective data-driven decision making across stakeholders
Requirements
- Proven experience building and optimizing ‘big data’ data pipelines, architectures and data sets, also for AI/ML applications
- A successful history of loading, manipulating, processing and extracting value from large disconnected datasets
- Knowledge and experience in building big data consumption patterns such as data APIs, visualization tools and platforms
- Working knowledge of stream & batch processing in/out to highly scalable data/metrics stores
- Advanced knowledge of query languages like SQL and working familiarity with a variety of databases
- Hands-on experience with non-relational & relational databases like Postgres, RedShift, Cassandra, Data Explorer, etc.
- Hands-on experience with programming languages: Python is a must
- Experience with Spark
- Experience with cloud platforms (AWS/Azure) is a plus
- Experience processing and managing large data sets (tens to hundreds of TB scale)
- Degree in Computer Science/Data Science/Data Engineering; Master’s is a plus