
Lead Software Engineer – ML, Agentic Workloads
Mara
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteJob Level
Senior
Tech Stack
CloudGrafanaKubernetesPrometheusPythonPyTorchRay
About the role
- Lead architecture and development of agentic platforms that integrate multiple models, tools, and knowledge sources into dynamic reasoning systems.
- Evaluate and deploy foundation and open-source models (LLMs, vision, multimodal) using efficient inference strategies and fine-tuning where applicable.
- Design and maintain prompt lifecycle pipelines with version control, testing, and CI/CD integration (“PromptOps”).
- Build and optimize RAG systems—vector database configuration, retriever-generator orchestration, and embedding quality improvement.
- Implement guardrail frameworks for content safety, hallucination control, and policy enforcement across agentic workflows.
- Integrate and extend agentic frameworks (LangChain, LangGraph, CrewAI, AutoGen, or equivalent), both in code-based and visual orchestration environments.
- Collaborate with data, product, and infrastructure teams to design scalable APIs and services that enable model-driven applications.
- Define observability and evaluation metrics for model performance, latency, and behavior drift in production.
- Drive best practices for secure AI development, privacy-preserving data handling, and governance of third-party model integrations.
- Mentor engineers across ML, backend, and platform domains; champion continuous learning and experimentation.
Requirements
- 8+ years of professional software engineering experience, including 3+ years in ML application development or AI platform engineering.
- Proficiency in Python, with strong understanding of ML toolchains (PyTorch, Hugging Face, LangChain, MLflow, Ray, etc.).
- Proven experience with model evaluation, fine-tuning, and deployment across cloud and on-prem environments.
- Hands-on experience with RAG architectures and vector databases (Weaviate, Milvus, pgvector, LanceDB, FAISS).
- Deep understanding of prompt design, orchestration, and versioning using CI/CD workflows and automated testing frameworks.
- Familiarity with agentic systems, both code-driven and visual-builder interfaces (LangGraph Studio, Dust, Flowise, Relevance AI, etc.).
- Strong knowledge of guardrail techniques (rule-based filters, policy evaluators, toxicity detection, grounding validation).
- Experience deploying ML systems on Kubernetes and serverless environments with observability (Prometheus, Grafana, OpenTelemetry).
- Solid understanding of API design, microservice architecture, and data pipeline integration.
- Excellent communication and leadership skills, with ability to translate complex ML concepts into actionable engineering outcomes.
Benefits
- Competitive salary
- Flexible working hours
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonML toolchainsPyTorchHugging FaceLangChainMLflowRayRAG architecturesvector databasesprompt design
Soft skills
communicationleadershipmentoringcollaborationcontinuous learningexperimentation