
Senior AL/ML Ops Engineer – Hybrid
Smartsheet
full-time
Posted on:
Location Type: Hybrid
Location: Bangalore • India
Visit company websiteExplore more
Job Level
About the role
- Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines
- Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable
- CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools
- Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms
- Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools. Experience in Monte Carlo is preferable
- Automation: Automate retraining and data pipeline workflows to ensure models stay accurate over time.
- Manage the deployment of foundation models, fine-tuning workflows, and Retrieval-Augmented Generation (RAG) stacks (Vector DBs, Knowledge Graph. Experience with AWS Bedrock is preferable
- Resource Optimization: Manage GPU/CPU utilization to minimize cloud costs while maintaining low-latency inference for users
- Collaboration: Work closely with data scientists, data engineers, and software engineers to bridge the gap between model development and production.
- Version Control & Governance: Manage versioning for data, code, and models using tools like MLflow.
- Security & Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data
- Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure
- Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform
- Perform other duties as assigned
Requirements
- Enterprise SaaS software solutions with high availability and scalability
- Solution handling large scale structured and unstructured data from varied data sources
- Experience in building and maintaining AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security
- Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
- In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem
- AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
- Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
- Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable
- Programming languages like Python and SQL
- Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
- Solution Cost Optimisations and design to cost
- Legally eligible to work in India on an ongoing basis.
Benefits
- Health insurance
- Flexible work arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AI/ML OpsModel DeploymentCI/CD Pipeline DevelopmentInfrastructure ManagementMonitoring & ObservabilityAutomationResource OptimizationVersion ControlSecurity & ComplianceSystem Design
Soft Skills
CollaborationTroubleshootingProblem Solving