Smartsheet

Senior AL/ML Ops Engineer – Hybrid

Smartsheet

full-time

Posted on:

Location Type: Hybrid

Location: BangaloreIndia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines
  • Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable
  • CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools
  • Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms
  • Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools. Experience in Monte Carlo is preferable
  • Automation: Automate retraining and data pipeline workflows to ensure models stay accurate over time.
  • Manage the deployment of foundation models, fine-tuning workflows, and Retrieval-Augmented Generation (RAG) stacks (Vector DBs, Knowledge Graph. Experience with AWS Bedrock is preferable
  • Resource Optimization: Manage GPU/CPU utilization to minimize cloud costs while maintaining low-latency inference for users
  • Collaboration: Work closely with data scientists, data engineers, and software engineers to bridge the gap between model development and production.
  • Version Control & Governance: Manage versioning for data, code, and models using tools like MLflow.
  • Security & Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data
  • Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure
  • Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform
  • Perform other duties as assigned

Requirements

  • Enterprise SaaS software solutions with high availability and scalability
  • Solution handling large scale structured and unstructured data from varied data sources
  • Experience in building and maintaining AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security
  • Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
  • In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem
  • AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
  • Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
  • Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable
  • Programming languages like Python and SQL
  • Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
  • Solution Cost Optimisations and design to cost
  • Legally eligible to work in India on an ongoing basis.
Benefits
  • Health insurance
  • Flexible work arrangements
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI/ML OpsModel DeploymentCI/CD Pipeline DevelopmentInfrastructure ManagementMonitoring & ObservabilityAutomationResource OptimizationVersion ControlSecurity & ComplianceSystem Design
Soft Skills
CollaborationTroubleshootingProblem Solving