Tech Stack
AirflowApacheBigQueryCloudElasticSearchLinuxMongoDBMySQLNoSQLPostgresPythonRDBMSSQL
About the role
- Implementing ingestion pipelines, using Airflow as the orchestration platform, for consuming data from a wide variety of sources (API, SFTP, Cloud Storage Bucket, etc.).
- Implementing transformation pipelines using software engineering best practices and tools (DBT)
- Working closely with Software Engineering and DevOps to maintain reproducible infrastructure and data that serves both API-only customers and in-house SaaS products
- Defining and implementing data ingestion/transformation quality control processes using established frameworks (Pytest, DBT)
- Building pipelines that use multiple technologies and cloud environments (for example, an Airflow pipeline pulling a file from an S3 bucket and loading the data into BigQuery)
- Create and ensure data automation stability with associated monitoring tools.
- Review existing and proposed infrastructure for architectural enhancements that follow both software engineering and data analytics best practices.
- Working closely with Data Science and facilitating advanced data analysis (like Machine Learning)
Requirements
- Must RESIDE in the United States and be eligible to work within the US.
- Strong working knowledge of Apache Airflow
- Experience supporting a SaaS or DaaS product, bonus points if you were creating new data products/features
- Strong in Linux environments and experience in scripting languages Python
- Expert
- Strong understanding of software best practices and associated tools.
- Experience in any major RDBMS (MySQL, Postgres, SQL Server, etc.).
- Strong SQL Skills, bonus points for having used both T-SQL and Standard SQL
- Experience with NoSQL (Elasticsearch, MongoDB, etc.)
- Multi-cloud and/or hybrid-cloud experience
- Strong interpersonal skills
- Comfortable working directly with data providers, including non-technical individuals
- Experience with the following (or transitioning from equivalent platform services):
- Cloud Storage
- Cloud Pubsub
- BigQuery
- Apache Airflow
- dbt
- DataFlow
- Bonus knowledge/experience:
- Experience implementing cloud architecture changes
- Working knowledge of how to build and maintain APIs using Python/FastAPI
- Transforming similar data from disparate sources to create canonical data structures
- Surfacing data to BI platforms such as Looker Studio
- Data Migration experience, especially from one cloud platform to another
- Certification: Professional Google Cloud Certified Data Engineer