Tech Stack
ApacheAWSAzureCloudERPETLIoTJavaKafkaNumpyPandasPySparkPythonScalaSparkSQLTableau
About the role
- Lead and execute large-scale data engineering projects using the Palantir Foundry platform.
- End-to-end development of robust and scalable ETL/ELT workflows leveraging Spark-based Transforms, the Pipeline Canvas, and Foundry’s low-code/no-code tools.
- Design, implement, and optimize Foundry Transforms (Code Workbooks) using Apache Spark (Python/Scala) to cleanse, validate, join, and expose datasets via Datasets or Views.
- Use Foundry’s Pipeline Canvas to wire together data sources, transformations, and outputs in a visual, low-code environment.
- Ingest and normalize large, disparate datasets from sources such as ERP, CRM, and IoT streams, operating on daily or near real-time cadences.
- Define and manage Ontologies (business-friendly data models) so analytics teams can self-serve data reliably and intuitively.
- Leverage Dataset Builder and Object Library to manage schemas, support full or incremental loads, and standardize dataset governance.
- Integrate data through standard and custom APIs and connectors (e.g., JDBC, S3, Kafka, Snowflake, Foundry’s Dataset Writer/Reader).
- Implement and maintain robust data quality checks and automated alerts to ensure early detection of anomalies or breaches of defined thresholds.
- Monitor and tune pipeline performance and resource usage (partitioning strategies, caching, and load balancing) for production environments.
- Automate deployment pipelines using CI/CD practices to promote Foundry Transforms into production safely and efficiently.
- Collaborate with cross-functional teams to ensure alignment between data engineering solutions and key business objectives.
- Provide technical mentorship, ensuring best practices in code quality, version control, testing, and documentation.
- Bring domain expertise to support aviation-related data challenges (if applicable), but open to broader industry applications.
Requirements
- Strong hands-on experience developing data pipelines and workflows in Palantir Foundry, including Transforms, Ontology modeling, Workspaces, Actions, and Pipeline Canvas.
- Deep understanding of Apache Spark APIs, including batch and streaming data processing.
- Advanced programming proficiency in Python; experience in Scala or Java is a plus.
- Strong command of SQL and working with structured, semi-structured, and unstructured data.
- Familiarity with key Python libraries and tools: PySpark, Pandas, NumPy, Great Expectations, Pytest/Unittest.
- Proven track record with CI/CD practices, preferably deploying to Foundry or cloud-based platforms (AWS, Azure).
- Understanding of data architecture, performance optimization, and modern ELT principles.
- Experience with data integration tools and connectors (e.g., JDBC, Kafka, S3, Snowflake).
- Applied experience with ontology management, dataset structuring, and self-serve enablement.
- Familiarity with agile methodologies (Scrum, Kanban) and managing operational tickets.
- Strong documentation, version control, and testing habits for data workflows.
- Industry experience in aviation is good to have.
- Familiarity with data visualization tools like Tableau or Power BI is good to have.
- Certifications in AWS, Azure, or other cloud platforms are good to have.
- Knowledge of machine learning pipelines or data science workflows is good to have.
- Experience with data governance, compliance standards, and metadata management is good to have.
- Background in DevOps or infrastructure-as-code for pipeline orchestration is good to have.
- Strong analytical and problem-solving mindset with attention to detail and data accuracy.
- Ability to explain technical concepts clearly to both technical and non-technical stakeholders.
- Proactive collaborator and effective communicator across multidisciplinary teams.
- Leadership and mentoring skills to guide junior engineers and contribute to a culture of learning.
- High adaptability to new technologies, including low-code/no-code tooling environments.
- Excellent time management and organizational skills, capable of juggling multiple priorities effectively.