Tech Stack
CassandraCloudDynamoDBETLHadoopHBaseIoTJavaKafkaMongoDBOpen SourceScalaSDLCSparkSQL
About the role
- Support, develop and maintain a data and analytics platform.
- Effectively and efficiently process, store and make data available to analysts and other consumers.
- Work with Business and IT teams to understand requirements and leverage technologies for agile data delivery at scale.
- Implement and automate deployment of distributed systems for ingesting and transforming data from relational, event-based, and unstructured sources.
- Implement continuous monitoring and troubleshooting of data quality and data integrity issues.
- Implement data governance processes and methods for managing metadata, access, and retention for internal and external users.
- Develop reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
- Develop physical data models and implement data storage architectures per design guidelines.
- Analyze complex data elements and systems to contribute to conceptual, physical and logical data models.
- Participate in testing and troubleshooting of data pipelines.
- Develop and operate large scale data storage and processing solutions using distributed and cloud-based platforms (Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, etc.).
- Use agile development practices including DevOps, Scrum, Kanban and continuous improvement cycles for data-driven applications.
Requirements
- College, university, or equivalent degree in relevant technical discipline, or relevant equivalent experience required.
- This position may require licensing for compliance with export controls or sanctions regulations.
- Relevant experience preferred such as temporary student employment, intern, co-op, or other extracurricular team activities.
- Exposure to Big Data open source technologies.
- Experience or exposure to SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka (or equivalent college coursework).
- Proficiency with SQL query language.
- Clustered compute cloud-based implementation experience.
- Familiarity developing applications requiring large file movement for a Cloud-based environment.
- Exposure to Agile software development (DevOps, Scrum, Kanban).
- Exposure to building analytical solutions.
- Exposure to IoT technology.
- Skills in ETL/ELT, data extraction, data quality, data governance, metadata management, retention and access controls.
- Programming skills: creating, writing and testing code, test scripts, build scripts, version control, build and test automation.
- Ability to analyze complex data elements, data flow, dependencies, and relationships.
- Problem solving and quality assurance metric application skills.