Tech Stack
ApacheAzureCloudDockerKafkaMicroservicesPythonSparkSQL
About the role
- Build our next-generation, cloud-native Data Platform in Azure.
- Design and implement real-time and batch data pipelines that integrate with predictive models and AI applications.
- Develop and deploy microservices and data APIs in the cloud.
- Build enterprise-grade data assets that will be used across reporting, analytics, and data science functions.
- Partner with data scientists and ML engineers to productionalize AI/ML models.
- Build the data infrastructure required to support RAG (Retrieval-Augmented Generation) applications using LLMs.
- Design vectorized data pipelines and integrate with embeddings and model APIs to support AI use cases.
- Mentor other data engineers and analysts in data engineering best practices and emerging technologies in AI/ML.
Requirements
- 5+ years' experience with Apache Spark or Databricks
- 5+ years' experience working with relational and/or non-relational databases
- 2+ years’ experience developing data services/APIs with object-oriented programming practices
- Experience building near real-time data streaming solutions (Kafka or similar)
- Exposure to building or supporting AI/ML applications, especially those using LLMs (e.g., GPT, Claude, etc.)
- Experience working with vector databases, embeddings, and semantic search is a strong plus
- Key skills - Apache Spark/Databricks, Apache Kafka, Docker
- Proficiencies – Python, Spark, and SQL programming
- Degree in Computer Science, IT, or a similar field