Design, build, and maintain robust, scalable data pipelines.
Perform data research to identify data sources within the ecosystem and apply enrichments to formulate meaningful data points.
Implement, optimize and maintain scheduled jobs, batch processors and real-time data ingestion pipelines.
Implement event-driven architecture to react to events.
Optimize and fine-tune database performance to ensure it can support big data with ideal response times.
Design data schemas that can evolve over time and align with strategic goals.
Design, Implement and optimize microservices to expose the data to consuming applications.
Design caching and data management practices to improve the performance.
Ensure the data architecture supports the business requirements.
Explore new opportunities for data acquisition and enhance data collection procedures.
Explore and identify appropriate segmentation strategies to support RAG implementations.
Demonstrate a commitment to learning and adopting emerging technologies, with a particular focus on agentic AI development.
Requirements
Over 10 years of experience as a Data Engineer or Software Engineer, with expertise in software engineering, data engineering, data warehousing, data research, and requirements gathering.
Demonstrated expertise in programming languages such as Python and PySpark for executing data engineering tasks.
Exceptional analytical and problem-solving skills, particularly in handling unstructured raw data and synthesizing meaningful patterns.
Hands-on experience in developing complete ETL pipelines, from source to destination, including data cleansing, transformation and enrichment.
Proficiency in PySpark for engineering data pipelines using Databricks on AWS or Azure.
Technical prowess in data modeling, data mining, data architectures, and data warehousing.
Proficiency in event driven architectures in cloud preferably in Azure (AWS/GCP is also good).
Proficiency in real-time data processing using Kafka.
Expertise in a range of database and data warehouse technologies such as SQL (MySQL, PostgreSQL), NoSQL databases (MongoDB, Azure Cosmos DB, Bigtable), and data warehouse/data lake technologies (Snowflake, BigQuery).
Proficiency in OAUTH providers, Auditing and Logging tools in cloud for monitoring and troubleshooting.
Expertise in microservices to expose data to consuming applications.
Familiarity with Node.js, TypeScript, and GraphQL is desired; willingness to learn these languages is strongly preferred.
Familiarity with Linux and Docker is a nice-to-have.
Familiarity with Agentic AI application development using frameworks such as ADK with Python is a plus.
Experience with Power BI and other visualization tools such as Kibana, Grafana, or Tableau is a plus.
Knowledge of cloud services (AWS, Google Cloud, or Azure) and understanding of distributed data processing frameworks.
Ability to manage and delegate work across delivery teams to meet priorities.
Skilled in client engagements, deciphering client business needs, and providing data solution recommendations.
Excellent communication skills, with experience in designing, developing, and delivering presentations.
Benefits
Medical/Dental/Vision coverage
401(k) plan
Tuition reimbursement program
Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
Paid Parental Leave
Paid Caregiver Leave
Additional sick leave beyond what state and local law require may be available but is unprotected