Data Engineer

Fervo Energy

Data Engineer designing and operating data pipelines within the Data & AI team for geothermal power plants. Collaborating with engineers, data scientists, and business stakeholders to support data-driven decisions.

Posted 6/23/2026full-timeHouston • Texas • 🇺🇸 United StatesJuniorMid-LevelWebsite

Tech Stack

Tools & technologies

Amazon RedshiftApacheAzureBigQueryCloudIoTKafkaPySparkPythonSparkSQLUnity

About the role

Key responsibilities & impact

Design, build, and operate scalable batch and real-time/streaming data pipelines on Databricks and Azure Data Factory, landing data in Azure Data Lake Storage (ADLS) and Snowflake
Implement the medallion (bronze/silver/gold) architecture using Delta Lake and Delta Live Tables, with reliable incremental processing, schema evolution, and change data capture
Build and tune Apache Spark jobs (PySpark/Spark SQL) for large-scale, parallel data processing — partitioning, shuffles, caching, broadcast joins, and cost/performance optimization
Ingest and process high-volume IoT and historian data (sensor, SCADA, time-series) via streaming frameworks (Structured Streaming, Event Hubs/Kafka) and micro-batch patterns
Model curated, analytics-ready datasets and serving layers that are well-documented, performant, and easy for downstream consumers to use
Implement automated data quality frameworks — validation, profiling, anomaly detection, freshness and completeness checks — with clear alerting and remediation paths
Build entity resolution and record linkage logic to unify wells, pads, assets, equipment, and events across heterogeneous source systems
Establish and enforce data governance using Unity Catalog — access controls, lineage, data classification, and a shared semantic/metadata layer that makes business concepts queryable and trustworthy.
Apply software engineering discipline to data: version control, code review, automated testing, and CI/CD pipelines (Azure DevOps or GitHub Actions) for data and infrastructure
Implement monitoring, logging, and observability across pipelines to support debugging, SLA tracking, cost monitoring, and continuous improvement
Support production incidents and platform-level issues impacting data pipelines and downstream consumers; develop runbooks and reduce toil through automation
Partner with analysts and stakeholders to deliver datasets and semantic models that power dashboards in Power BI and Spotfire
Collaborate with Data Science and AI Engineering to provision clean, governed, feature-ready data for ML and agentic workflows
Translate domain problems from drilling, completions, production, geophysics, and power plant operations into well-scoped, reliable data products with clear ownership and success metrics

Requirements

What you’ll need

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, Information Systems, Applied Mathematics, Physics, or a related technical field — or equivalent practical experience demonstrated through a portfolio of shipped data systems.
2+ years of hands-on experience building and operating production data pipelines, not just prototypes or notebooks
Deep understanding of the Apache Spark framework and distributed, parallel data processing — partitioning, shuffles, joins, caching, and performance tuning at scale
Strong programming skills in Python (PySpark) and SQL, including writing testable, maintainable production code
Hands-on experience with Databricks, including Delta Lake, Delta Live Tables, and Unity Catalog
Experience with Azure Data Factory and Azure Data Lake Storage (ADLS), or equivalent cloud data services with willingness to work in our Azure-first environment
Experience with cloud data warehousing on Snowflake (or equivalent: BigQuery, Redshift, Databricks SQL)
Experience building both real-time/streaming and batch pipelines (Structured Streaming, Event Hubs/Kafka, or similar)
Solid data modeling skills (dimensional, medallion/lakehouse, or normalized) and a track record of building well-documented, consumable datasets
Experience implementing data quality, validation, and observability for pipelines
Strong Git and CI/CD experience (Azure DevOps or GitHub Actions), including version control discipline, code review, and automated testing
Experience delivering data to BI/analytics tools such as Power BI and/or Spotfire

Benefits

Comp & perks

Comprehensive suite of benefits including medical, dental, vision, life, short-term and long-term disability, flexible paid time off, and paid parental leave.
Incentive stock options program
Bonus incentive program
401(k) plan with an employer match

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Apache SparkPySparkSQLDelta LakeDelta Live Tablesdata modelingdata qualityobservabilityCI/CDautomated testing

Soft Skills

collaborationproblem-solvingcommunicationanalytical thinkingattention to detailstakeholder engagementdocumentationdebuggingincident managementautomation