Tech Stack
AirflowAWSAzureCloudCyber SecurityGoogle Cloud PlatformKafkaPySparkPythonSparkSQLUnity
About the role
- Build and optimize data ingestion pipelines on Databricks (batch and streaming) to process structured, semi-structured, and unstructured data.
- Implement scalable data models and transformations leveraging Delta Lake and open data formats (Parquet, Delta).
- Design and manage workflows with Databricks Workflows, Airflow, or equivalent orchestration tools.
- Implement automated testing, lineage, and monitoring frameworks using tools like Great Expectations and Unity Catalog.
- Build integrations with enterprise and third-party systems via cloud APIs, Kafka/Kinesis, and connectors into Databricks.
- Partner with AI/ML teams to provision feature stores, integrate vector databases (Pinecone, Milvus, Weaviate), and support RAG-style architectures.
- Optimize Spark and SQL workloads for speed and cost efficiency across multi-cloud environments (AWS, Azure, GCP).
- Apply secure-by-design data engineering practices aligned with Point Wild’s cybersecurity standards and post-quantum cryptographic frameworks.
- Collaborate closely with data architects, AI engineers, and product leaders to deliver a scalable, resilient, and secure foundation for analytics and ML.
Requirements
- At least 5 years in Data Engineering with strong experience building production data systems on Databricks.
- Expertise in PySpark, SQL, and Python.
- Strong expertise with various AWS services.
- Strong knowledge of Delta Lake, Parquet, and lakehouse architectures.
- Experience with streaming frameworks (Structured Streaming, Kafka, Kinesis, or Pub/Sub).
- Familiarity with DBT for transformation and analytics workflows.
- Strong understanding of data governance and security controls (Unity Catalog, IAM).
- Exposure to AI/ML data workflows (feature stores, embeddings, vector databases such as Pinecone, Milvus, Weaviate).
- Detail-oriented, collaborative, comfortable working in a fast-paced innovation-driven environment.
- Bonus: Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- Bonus: Data Engineering experience in a B2B SaaS organization.