Design, build, and maintain scalable data pipelines and infrastructure for large-scale data processing and analytics using technologies such as Hadoop, Spark, distributed event store and stream-processing platform, in-memory databases, and other big data tools.
Build large-scale distributed event streaming platforms such as Apache Kafka and Google Cloud Pub/Sub.
Develop and deploy real-time data processing pipelines for near-real-time (NRT) streaming data.
Design and implement data storage solutions using Data Lakes and NoSQL databases to support high-volume and high-velocity data processing.
Develop and implement machine learning models to support predictive analytics and automation.
Develop and deploy natural language processing (NLP) models using chatgpt and other Gen AI focused tools and platforms.
Work with data scientists, analysts, and other stakeholders to understand data requirements and develop solutions that meet their needs.
Develop and maintain data quality and governance processes to ensure data accuracy, completeness, and consistency across different systems and sources.
Design and implement job scheduling and automation using scheduling tools.
Optimize data processing workflows using managed services provided by cloud platforms.
Identify and resolve performance bottlenecks, data quality issues, and other technical challenges that arise in large-scale data processing environments.
Create and maintain documentation and best practices for data engineering processes and systems.
Stay up-to-date with the latest trends and innovations in big data, cloud computing, and related technologies, and adapt these technologies to improve data processing and analytics capabilities.
Build data models to support data visualization and analysis.
Develop and maintain data pipelines to extract, transform, and load data from various sources into the data visualization tool.
Build and maintain dashboards and reports to provide insights into business performance and trends.
Develop and maintain data validation and testing procedures to ensure data accuracy.

Requirements

Bachelor’s degree or equivalent in Computer Science, Engineering (Any), or related field and 4 years of experience in software engineering, data engineering, database engineering, business intelligence, business analytics or related field; OR Master's degree or equivalent in Computer Science, Engineering (Any), or related field and 2 years of experience in software engineering, data engineering, database engineering, business intelligence, business analytics or related field.
Experience with software development in object-oriented languages such as Scala, GoLang and Python.
Experience designing REST API web services using Nginx, Scala, Python and Golang.
Experience working with relational databases like PostgreSQL, MariaDB and NoSQL Databases like Elasticsearch and Redis.
Experience with software design and architectural patterns like microservice, client-server, model-view-controller, sharding, pub-sub, and event-driven.
Experience working with distributed queue systems such as Kafka and Nats.
Experience with Microsoft Azure Storage and Google Big Query for data storage and querying.
Experience with data transformation techniques like ETL batch processing, stream ingestion, API integration and data normalization.
Experience developing data processing pipelines using technology stacks such as Apache Kafka, NATs, Elasticsearch and Akka.
Experience deploying application and pipelines using scheduling, CI/CD and orchestration frameworks like Kubernetes, Jenkins, Ansible and Docker.
Experience developing data visualizations and dashboards using tools like Grafana, Kibana, and PowerBI.
Experience writing complex queries in SQL and Elasticsearch to analyze data including the analysis of timeseries data.
Experience managing Linux-based systems and deploying applications on Linux.
Experience with a high-level understanding of network devices, terminology, protocols and concepts like routers, switches, bandwidth, BGP, and SNMP.
Experience working with Log Management and analyzing platforms like Splunk, Graylog, and Elasticsearch.

Benefits

Health benefits include medical, vision and dental coverage.
Financial benefits include 401(k), stock purchase and company-paid life insurance.
Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty and voting.
Other benefits include short-term and long-term disability, education assistance with 100% company paid college degrees, company discounts, military service pay, adoption expense reimbursement, and more.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

HadoopSparkApache KafkaGoogle Cloud Pub/SubNoSQL databasesmachine learningnatural language processingETLSQLdata visualization

Soft Skills

collaborationproblem-solvingcommunicationdocumentationadaptabilitydata governancedata quality managementanalytical thinkingstakeholder engagementprocess optimization