Salary
💰 $205,000 - $250,000 per year
Tech Stack
AirflowAWSCloudDistributed SystemsHadoopKafkaPySparkPythonSparkSQLTerraform
About the role
- Design and build robust, highly scalable data pipelines and lakehouse infrastructure with PySpark, Databricks, and Airflow on AWS
- Improve the data platform development experience for Engineering, Data Science, and Product by creating intuitive abstractions, self‑service tooling, and clear documentation
- Own and maintain core data pipelines and models that power internal dashboards, ML models, and customer-facing products
- Own the Data & ML platform infrastructure using Terraform, including end‑to‑end administration of Databricks workspaces: manage user access, monitor performance, optimize configurations (e.g., clusters, lakehouse settings), and ensure high availability of data pipelines
- Lead projects to improve data quality, testing, observability, and cost efficiency across existing pipelines and backend systems (e.g., migrating Databricks SQL pipelines to dbt, scaling data ingestion, improving data-lineage tracking, and enhancing monitoring)
- Act as the primary engineering partner for the Data Science team—embedded closely to gather requirements, design scalable solutions, and provide end-to-end support on all engineering aspects of their work
- Work closely with backend engineers and data scientists to design performant data models and support new product development initiatives
- Share best practices and mentor other engineers working on data-centric systems
Requirements
- 4+ years of experience in software engineering with a strong background in data infrastructure, pipelines, and distributed systems
- Advanced proficiency in Python and SQL
- Hands-on Spark development experience
- Expertise with modern cloud data stacks—AWS (S3, RDS), Databricks, and Airflow—and lakehouse architectures
- Hands‑on experience with foundational data‑infrastructure technologies such as Hadoop, Hive, Kafka (or similar streaming platforms), Delta Lake/Iceberg, and distributed query engines like Trino/Presto
- Familiarity with ingestion frameworks, developer‑experience tooling, and best practices for data versioning, lineage, partitioning, and clustering
- Strong problem-solving skills and a proactive attitude toward ownership and platform health
- Excellent communication and collaboration skills, especially in cross-functional settings
- Legally authorized to work in the United States without restrictions (application requires this)
- Comfortable with Parafin’s hybrid in-office work policy (on-site Tuesday through Thursday)
- Bonus: Experience with AWS infrastructure using Terraform
- Bonus: Familiarity with observability tools (e.g., Datadog) and cost tracking in cloud environments
- Bonus: Experience with financial systems or building platforms in a fintech setting
- Bonus: Prior work on ML infrastructure (feature stores, model lifecycle, real-time inference)
- Bonus: Contributions to internal tooling or open-source projects in the data ecosystem