
Staff Data Engineer
BLACKBIRD.AI
full-time
Posted on:
Location Type: Remote
Location: New York • Texas • United States
Visit company websiteExplore more
Salary
💰 $160,000 - $190,000 per year
Job Level
About the role
- Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion
- Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers
- Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence
- Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data
- Work with AI/ML researchers to implement, integrate and scaling AI processing
- Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services
- Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability
- Design and implement data quality frameworks, monitoring, and alerting systems
- Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability
- Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage
- Make critical build-vs-buy decisions and establish architectural standards for the data organization
- Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing
Requirements
- 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
- Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
- Strong experience building and operating data pipelines at scale (handling TBs+ of data)
- Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
- Proficiency in Python, DBT, and SQL for data processing and pipeline development
- Experience with both batch and streaming large scale data processing patterns
- Strong understanding of cloud platforms (AWS, Azure)
- Excellent communication skills and ability to mentor engineers
- **Preferred Qualifications:**
- Experience designing both batch and streaming/near real-time data architectures
- Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
- Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
- Experience with Agentic AI, context engineering, and evaluation
- Background in trust & safety, security, or content moderation domains
- Experience with data observability tools and building comprehensive monitoring systems
- Prior experience at a startup or fast-paced environment
- Apply agentic coding tools for day to day development
- Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases
Benefits
- Competitive compensation package, 401(k), and equity -** everyone has a stake in our growth! **
- Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - **an apple a day doesn't always keep the doctor away! **
- Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
- A flexible work environment with opportunities to collaborate with your team in person -** you can have it all! **
- Inclusion and Impact **- soar to new heights! **
- Professional development stipend -** never stop learning! **
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
DatabricksApache Sparkdata lakehouse architecturedata pipelinesPythonDBTSQLElasticsearchAI/ML integrationdata quality frameworks
Soft Skills
communicationmentoringteam collaborationknowledge sharingcritical decision making