Tech Stack
ApacheDistributed SystemsHDFSJavaKafkaPythonScalaSparkSQL
About the role
- Lead design and development of the Infinia Data Engine powering high-performance, AI-native data workflows.
- Design autonomous logic to optimize SQL and non-SQL analytic queries for distributed infrastructure.
- Implement high-performance indexing for structured and non-structured data (B-epsilon trees, full-text indexing, vectorization).
- Develop internal systems for high-throughput data access and transformation using Parquet, ORC, and Avro.
- Engineer integration layers supporting Trino, Apache Spark, Apache Iceberg, Delta Lake, HDFS, and Hive Metastore.
- Build and tune execution plans to leverage Infinia’s high-throughput I/O and compute capabilities for large-scale AI and analytics workloads.
- Analyze and optimize performance of distributed query execution, data storage, caching, and memory usage.
- Write automated tests to validate correctness and performance across varied cluster topologies.
- Contribute to relevant open-source ecosystems through collaboration, feature integration, or direct code contributions.
- Partner with Data Scientists, Platform Engineers, and Product Managers to deliver integrated, end-to-end solutions.
- Provide technical leadership, mentorship, and design direction to other engineers on the team.
- Participate in an on-call rotation to provide after-hours support as needed.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 12+ years of experience in software development, with 5+ years in distributed systems, data platforms, or big data technologies.
- Expert-level knowledge of SQL, Python, and Java or Scala.
- Experience working with Apache Spark, distributed query engines, or distributed databases.
- Strong familiarity with HDFS, Hive Metastore, and data partitioning strategies.
- Hands-on experience with Apache Iceberg and/or Delta Lake (preferred).
- Deep understanding of Parquet, ORC, Avro file formats and their performance characteristics (preferred).
- Background in real-time data streaming using tools such as Apache Kafka (preferred).
- Prior experience with C++ (preferred).
- Prior contributions to open-source projects; committer status is a plus (preferred).
- Proven ability to lead complex technical initiatives and mentor junior engineers.
- Willingness to participate in an on-call rotation to provide after-hours support as needed.