
Senior Data Engineer – AI Solutions
Quisitive
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteJob Level
Senior
Tech Stack
AzureETLNeo4jOraclePostgresPythonSparkVault
About the role
- Building and maintaining backend data ingestion and embedding pipelines
- Setting up environments, clone repositories, and running pipelines in JupyterHub
- Working on large-scale ETL processes, including converting Iceberg tables to Parquet and exporting data to S3 buckets
- Designing and optimizing schemas for Neo4j-based graph solutions
- Integrating knowledge workflows and KB articles into graph structures for advanced retrieval
- Troubleshooting data quality issues and optimizing Spark jobs for efficiency
- Implementing retry mechanisms and debugging full-stack issues related to large file operations
- Managing secure access using JWT and Kerberos authentication
- Handling credentials for Oracle DB and API clients via HashiCorp Vault
- Working with GitLab for source control and Jira for project tracking
- Supporting migration efforts from Azure DevOps to GitLab/Jira environments
Requirements
- Strong proficiency in Python for data processing and pipeline development
- Hands-on experience with Spark, Iceberg, and large-scale data frameworks
- Familiarity with Neo4j, LangChain, and LLM integration for AI-driven solutions
- Experience with Oracle DB, PostgreSQL, and PGVector for embedding strategies
- Comfortable working with S3 buckets, Parquet, and CSV formats
- Exposure to embedding models like BGEM 3 and Nomic
- Understanding of AI-powered retrieval and recommendation systems
- JupyterHub for testing/debugging
- Power BI for dashboard development and reporting
- Knowledge of Kerberos authentication and secrets management with HashiCorp Vault
Benefits
- Passionate team members
- Challenging projects
- A great place to work
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonSparkETLNeo4jIcebergOracle DBPostgreSQLS3ParquetCSV