
Data Engineer – AI
Upbound
full-time
Posted on:
Location Type: Remote
Location: Remote • California • 🇺🇸 United States
Visit company websiteJob Level
SeniorLead
Tech Stack
AirflowCloudElasticSearchKubernetesSpark
About the role
- Define and drive the technical vision for data platforms that support AI-powered features in Crossplane and Upbound Spaces
- Lead the design of data pipelines that transform infrastructure and data into training datasets for ML models
- Architect vector search and RAG systems that leverage Crossplane Control Planes & Upbound Marketplace as a knowledge store
- Build data infrastructure that processes resources, extensions, and compositions for semantic search
- Establish frameworks for collecting, processing, and analyzing infrastructure configuration data
- Design data pipelines that handle Crossplane-specific data
- Create infrastructure for indexing and searching Upbound Marketplace content, documentation, and community patterns
- Develop metrics and monitoring for AI features integrated with Upbound's control plane architecture
- Design data systems that power AI agents for infrastructure provisioning & operations, helping users generate and optimize Crossplane compositions
- Create feature engineering platforms that extract signals from control plane operations, resource status, and reconciliation patterns
- Implement data infrastructure for training models that predict infrastructure failures, optimize resource allocation, and suggest configuration improvements
- Drive the development of knowledge graph representations of infrastructure dependencies and relationships
Requirements
- 10+ years of software/data engineering experience with at least 4 years in technical leadership roles
- Proven track record building data platforms that support production systems at scale
- Deep expertise in both traditional data engineering (Spark, Airflow, data lakes) and ML-specific infrastructure (feature stores, model serving)
- Experience with vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Opensearch, ElasticSearch)
- Demonstrated experience with LLM applications, including RAG architectures and semantic search implementations
- Understanding of Kubernetes, cloud-native architectures, and infrastructure-as-code principles
- Strong understanding of data requirements for AI/ML systems: training pipelines, feature stores, and inference infrastructure
- Hands-on experience building knowledge bases and semantic search systems for technical documentation and code
- Experience with embedding models for code and technical documentation
- Knowledge of time-series data processing for infrastructure metrics and events
- Understanding of graph databases and their application to infrastructure dependency modeling
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
data engineeringdata pipelinesmachine learningfeature storesmodel servingvector databasessemantic searchinfrastructure-as-codetime-series data processinggraph databases
Soft skills
technical leadershipcommunicationproblem-solvingcollaborationstrategic vision