Build the infrastructure and tooling required for optimal extraction, transformation and loading of data from a wide variety of data sources using cloud native big data services from AWS/Azure
Deploy analytics tools that utilize the data pipeline to provide actionable insights into customer usage, operational efficiency and other analytical and business performance metrics
Work with stakeholders including the business analysts, data scientists to assist with data-related technical issues and support their data (infrastructure) needs and prepare data for modeling and analytics
Understanding of effective data driven decision making across stakeholders.
Requirements
Proven experience building and optimizing ‘big data’ data pipelines, architectures and data sets, also for AI/ML applications
A successful history of loading, manipulating, processing and extracting value from large disconnected datasets
Working knowledge of stream & batch processing in/out to highly scalable data/metrics stores
Advanced knowledge of query languages like SQL and working familiarity with a variety of databases
Hands-on experience with non-relational & relational databases like Postgres, RedShift, Cassandra, Data Explorer, etc.
Hands-on experience with programming languages Python is a must, Spark
Experience with processing and managing large data sets (tens to hundreds of TB scale)
Knowledge and experience in building big data consumption patterns such as data APIs, visualization tools and platforms is good to have
Experience with one of the the cloud platforms like AWS/Azure is a plus
Degree in Computer science/Data science/Data engineering or a Master’s is a plus.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
big datadata pipelinesdata architecturesdata setsAI/ML applicationsstream processingbatch processingSQLPythonSpark