Salary
💰 $160,000 - $180,000 per year
Tech Stack
AirflowAzureCloudDistributed SystemsDockerKubernetesMicroservicesPostgresTerraform
About the role
- Lead the design and implementation of TraceGains' next-generation data and MLOps platform on Azure
- Architect end-to-end MLOps capabilities that support Intelligent Document Processing, supply chain risk prediction, and knowledge graph applications
- Architect scalable, multi-tenant data platform using Azure Data Factory, Databricks, and Azure Synapse Analytics
- Design hybrid data architectures supporting operational systems, AI workloads, and knowledge graphs
- Build vector databases and graph database infrastructure for RAG applications and semantic search
- Design and implement comprehensive MLOps platform on Azure supporting the full ML lifecycle
- Build automated ML pipelines using Azure ML, MLflow, and Azure DevOps for CI/CD
- Implement real-time inference infrastructure with monitoring, alerting, and automated drift detection
- Build and hire a technical team of data engineers
- End-to-end knowledge graph lifecycle management including hydration from taxonomies/ontologies
- Implement Infrastructure as Code using Terraform and build CI/CD pipelines for data products and ML models
- Design containerized microservices architecture using Docker and Azure Kubernetes Service
- Create self-service capabilities with comprehensive monitoring and observability
- Report to the VP of Engineering
Requirements
- Master's degree in Computer Science, Data Engineering, or related field (or equivalent experience)
- 8-12 years building enterprise data and AI platforms in production environments
- Proven track record designing and implementing MLOps platforms on Azure with measurable business impact
- 5+ years hands-on experience with Azure ML, Azure Synapse, Azure Data Factory, and/or Azure Kubernetes Service
- MLOps & AI Platforms: MLflow, Kubeflow or Azure ML pipelines, model monitoring and drift detection
- Data Engineering: Modern data stack (dbt, Airflow), real-time streaming, data lake/warehouse architecture
- Cloud Infrastructure: Azure native services, Terraform, Kubernetes, containerization strategies
- Databases & Storage: PostgreSQL, graph databases, vector stores, distributed systems design
- DevOps & Platform Engineering: CI/CD for ML, Infrastructure as Code, monitoring and observability
- Proven ability to establish shared platform capabilities that serve multiple product teams
- Strong communication skills with ability to present to executive leadership
- Track record of cross-functional collaboration with AI product teams, ML, and business stakeholders
- Experience establishing technical standards and governance frameworks across distributed teams