
Software Engineer, Enterprise Data Platform
Notion
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $230,000 - $300,000 per year
About the role
- Design and evolve the data lakehouse
- Build and operate core lakehouse components (e.g., Iceberg/Hudi/Delta tables, catalogs, schema management) that serve as the source of truth for analytics, AI, and search.
- Own critical data pipelines and services
- Design, implement, and harden batch and streaming pipelines (Spark, Kafka, EMR, etc.) that move and transform data reliably across regions and cells.
- Advance EKM and encryption-by-design
- Work with Security and platform teams to integrate Enterprise Key Management (EKM) into data workflows, including file- and record-level encryption and safe key handling in Spark and storage systems.
- Improve data access, auditability, and residency
- Build primitives for fine-grained access control, auditing, and data residency so customers can see who accessed what, where, and under which guarantees.
- Drive reliability and observability
- Raise the operational bar for our data stack: improve on-call experience, debugging, and alerting for data jobs and services.
- Optimize large-scale performance and cost
- Tackle performance and cost challenges across Kafka, Spark, and storage for very large workspaces (20k+ users, multi-cell deployments), including cluster migrations and workload tuning.
- Enable ML and search workflows
- Build infrastructure to support training and inference pipelines, ranking workflows, and embedding infrastructure on top of the shared data platform.
- Shape the platform roadmap
- Contribute to design docs and evaluations that influence our long-term platform direction and vendor choices.
Requirements
- 5+ years building and operating data platforms or large-scale data infrastructure for SaaS or similar environments.
- Strong skills in at least one of Python, Java, or Scala; comfortable working with SQL for analytics and data modeling.
- Hands-on experience with Spark or similar distributed processing systems, including debugging and performance tuning.
- Experience with Kafka or equivalent streaming systems; familiarity with CDC/ingestion patterns (e.g., Debezium, Fivetran, custom connectors).
- Experience with data lakes and table formats (Iceberg, Hudi, or Delta) and/or data catalogs and schema evolution.
- Practical understanding of access control, encryption at rest/in transit, and auditing as they apply to data platforms.
- Experience with at least one major cloud provider (AWS, GCP, or Azure) and managed data/compute services (e.g., EMR, Dataproc, Kubernetes-based compute).
- Comfortable owning services and pipelines in production, including on-call, incident response, and reliability improvements.
Benefits
- Health insurance
- 401(k) matching
- Flexible work arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonJavaScalaSQLSparkKafkaEMRIcebergHudiDelta
Soft Skills
reliability improvementsincident responsedebuggingperformance tuningon-call experience