Define and implement enterprise-wide data architecture strategy supporting interoperability, AI/ML readiness, and regulatory compliance
Lead evolution of AWS-based data lake architecture for structured, semi-structured, and unstructured data, especially FHIR JSON healthcare data
Design and maintain scalable, secure, cost-effective data lakes using S3, Glue, Athena, Redshift, Lake Formation and Mountpoint for S3
Optimize storage and retrieval (partitioning, Parquet/ORC, compression) for performance and cost-efficiency
Collaborate with data science to implement embedding models, vectorization pipelines, and real-time inference architectures
Design/manage vector storage (S3-based, FAISS, Pinecone, OpenSearch) for semantic search and RAG
Architect ingestion, processing, normalization, and enrichment pipelines for FHIR/HL7 and EHR/API sources
Ensure data security at rest/in transit using AWS encryption, IAM, VPC, bucket policies; implement access controls and audit logging
Oversee data governance, lineage, cataloging, and stewardship; promote data literacy and lead a team of data architects/engineers
Requirements
Bachelor’s or Master’s in Computer Science, Data Engineering, or related field
8–12+ years of experience in data architecture with 3–5 years in a technical leadership role
Proven experience architecting AWS-based data lakes and analytics pipelines (Amazon S3, AWS Glue, Athena, Redshift, Lake Formation)
Deep understanding of healthcare data standards (FHIR, HL7) and working with FHIR JSON objects at scale
Expertise with embedding and vectorization models, semantic search, and managing vector storage solutions (FAISS, Pinecone, Amazon OpenSearch, S3-based vectors)
Hands-on experience with Amazon S3, Mountpoint for S3, and optimizing S3-based workloads for performance and cost
Strong background in data security, encryption, access control, and compliance frameworks (HIPAA, HITRUST)
Experience implementing data governance, lineage, cataloging (AWS Glue Data Catalog, Lake Formation)
Leadership experience building and managing data architect/engineer teams and cross-functional collaboration
Preferred: AWS certifications (Big Data/Data Analytics), familiarity with open-source vector DBs (FAISS, Weaviate), MLOps pipelines, clinical systems integration, claims processing, or population health analytics