
Senior ML Engineer – ML/Inference
Mara
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteJob Level
Senior
Tech Stack
AirflowAWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonPyTorchRay
About the role
- Own the end-to-end lifecycle of ML model deployment—from training artifacts to production inference services.
- Design, build, and maintain scalable inference pipelines using modern orchestration frameworks (e.g., Kubeflow, Airflow, Ray, MLflow).
- Implement and optimize model serving infrastructure for latency, throughput, and cost efficiency across GPU and CPU clusters.
- Develop and tune Retrieval-Augmented Generation (RAG) systems, including vector database configuration, embedding optimization, and retriever–generator orchestration.
- Collaborate with product and platform teams to integrate model APIs and agentic workflows into customer-facing systems.
- Evaluate, benchmark, and optimize large language and multimodal models using quantization, pruning, and distillation techniques.
- Design CI/CD workflows for ML systems, ensuring reproducibility, observability, and continuous delivery of model updates.
- Contribute to the development of internal tools for dataset management, feature stores, and evaluation pipelines.
- Monitor production model performance, detect drift, and drive improvements to reliability and explainability.
- Explore and integrate emerging agentic and orchestration frameworks (LangChain, LangGraph, CrewAI, etc.) to accelerate development of intelligent systems.
Requirements
- 5+ years of experience in applied ML or ML infrastructure engineering.
- Proven expertise in model serving and inference optimization (TensorRT, ONNX, vLLM, Triton, DeepSpeed, or similar).
- Strong proficiency in Python, with experience building APIs and pipelines using FastAPI, PyTorch, and Hugging Face tooling.
- Experience configuring and tuning RAG systems (vector databases such as Milvus, Weaviate, LanceDB, or pgvector).
- Solid foundation in MLOps practices: versioning (MLflow, DVC), orchestration (Airflow, Kubeflow), and monitoring (Prometheus, Grafana, Sentry).
- Familiarity with distributed compute systems (Kubernetes, Ray, Slurm) and cloud ML stacks (AWS Sagemaker, GCP Vertex AI, Azure ML).
- Understanding of prompt engineering, agentic frameworks, and LLM evaluation.
- Strong collaboration and documentation skills, with ability to bridge ML research, DevOps, and product development.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
ML model deploymentinference pipelinesmodel serving optimizationRetrieval-Augmented Generation (RAG)quantizationpruningdistillationCI/CD workflowsdataset managementfeature stores
Soft skills
collaborationdocumentationcommunication