Tech Stack
AWSAzureCloudCyber SecurityDockerGoogle Cloud PlatformKubernetes
About the role
- Define and execute the enterprise AI infrastructure strategy, aligning with corporate goals and in partnership with Cybersecurity, Network/Infrastructure, and Enterprise Architecture teams
- Design, build, and operate enterprise-grade AI/ML infrastructure that underpins products, services, and internal operations
- Establish foundations spanning data pipelines, model training, deployment platforms, monitoring/observability, and governance
- Integrate AI infrastructure into existing enterprise IT and security architectures
- Drive adoption of AI/ML engineering rigor, ensuring security, resilience, and compliance are baked into every layer of the AI stack
- Serve as senior technology leader advising executives on emerging technologies, risks, and opportunities
- Build and scale foundational AI/ML platforms including data ingestion pipelines, model training environments, orchestration frameworks, and deployment toolchains
- Develop secure and compliant AI infrastructure aligned with NIST, ISO, EO 14028, and Zero Trust principles
- Partner with Network Engineering and Cybersecurity to harden, monitor, and protect AI workloads across on-prem, hybrid, and cloud environments
- Oversee AI observability and monitoring including model drift detection, bias/fairness monitoring, and data lineage tracking
- Champion automation and MLSecOps principles to improve speed, reliability, and repeatability in model training and deployment
- Establish AI infrastructure policies and standards for access control, data security, compliance, and responsible AI operations
- Collaborate with Cyber GRC and Risk teams to ensure auditability, transparency, and adherence to governance requirements
- Manage vendor relationships with cloud providers, AI/ML platform vendors, and security partners
- Build and lead a cross-disciplinary AI Infrastructure team of engineers, architects, and operations specialists
- Mentor and grow talent and drive a culture of engineering excellence, accountability, and continuous improvement
Requirements
- 15+ years in enterprise IT, data engineering, cybersecurity, or infrastructure leadership, with at least 3+ years leading AI/ML infrastructure at scale
- Proven experience with cloud-native AI platforms (AWS Sagemaker, Azure ML, GCP Vertex AI etc.) and on-prem GPU/accelerator infrastructure
- Strong background in enterprise-grade networking, storage, and cybersecurity controls
- Deep understanding of MLOps/MLSecOps practices, including CI/CD for ML, containerization (Docker/Kubernetes), and monitoring frameworks
- Familiarity with Zero Trust Architecture, NIST 800-53a rev5, ISO 27001, ITIL, and SOC2 frameworks
- Strong leadership, communication, and collaboration skills with executive presence
- Experience in Financial Services, Defense, or other highly regulated industries
- Knowledge of data governance frameworks (e.g., DAMA-DMBOK, DCAM)
- Advanced degree(s) or certification(s) in Computer Science, Data Engineering, AI/ML, Cybersecurity, or related fields