
Data & AI Operations Specialist
ZainTech
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
About the role
- Design & Architecture: Maintain the monitoring architecture for AI/ML platforms and configure advanced dashboards in Grafana and Azure Monitor.
- Environment Governance: Manage Azure Machine Learning (AML) workspace configurations, compute targets, and Databricks cluster lifecycles (including runtime versions and platform patching).
- Resource Optimization: Oversee GPU resource allocation, reserved capacity, and cost-performance optimization to align with FinOps goals.
- Security Integration: Ensure all AI services utilize private endpoints, VNET integration, and RBAC controls to protect sensitive citizen data.
- Pipeline Engineering: Own the design, optimization, and remediation of Azure Data Factory (ADF) and Synapse pipelines.
- Advanced Troubleshooting: Resolve complex bottlenecks related to authentication failures, data format changes, and ETL performance.
- SOP Leadership: Author step-by-step Standard Operating Procedures (SOPs) for the L1 NOC team to handle routine monitoring and first-line triage.
- Automation: Implement CI/CD pipelines for model training, testing, and deployment to AML endpoints.
- Model Reliability: Configure data drift detection thresholds and automated retraining triggers.
- Recovery Operations: Develop self-healing scripts and automated recovery runbooks for critical AI workflows.
- Audit Management: Implement and maintain audit logging for all AI decisions and model outputs, ensuring logs flow to the SIEM/vSOC.
- Regulatory Alignment: Conduct quarterly AI governance reviews to ensure compliance with NESA standards and data privacy guidelines.
Requirements
- AI/ML Platforms: Deep expertise in Azure Machine Learning and Databricks.
- Data Integration: Proficiency in Azure Data Factory and Synapse.
- Infrastructure-as-Code (IaC): Experience with Terraform or ARM Templates for reproducible deployments.
- Observability: Ability to use Dynatrace, Grafana, and Azure Monitor for deep-tier diagnostics.
- Containerization: Knowledge of AKS, Istio Service Mesh, and KEDA.
- ITIL Mastery: Strong understanding of ITIL-aligned Incident, Change, and Problem management.
- Security Mindset: Familiarity with NESA standards and UAE data residency requirements.
- Technical Writing: Ability to draft complex SOPs and Root Cause Analysis (RCA) documents within 48 hours of an incident.
- Certifications: Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate is highly preferred.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Azure Machine LearningDatabricksAzure Data FactorySynapseTerraformARM TemplatesCI/CD pipelinesdata drift detectionself-healing scriptsaudit logging
Soft Skills
technical writingSOP leadershipproblem managementincident managementchange management
Certifications
Microsoft Azure Data Scientist AssociateAzure AI Engineer Associate