FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsKubernetesPythonPyTorchSparkTensorflowTerraform
About the role
Key responsibilities & impact- Design and implement foundational GenAI services: vector search, prompt tuning, agent orchestration, document extraction, context/memory services, model/endpoint registry, feature/embedding stores, guardrails, and evaluation pipelines
- Build the underlying infrastructure for autonomous and semi-autonomous AI agents including support for agent collaboration, reasoning, and memory persistence, enabling continuous context-aware execution
- Build standardized APIs/SDKs that make it easy for product teams to compose, deploy, and monitor Generative AI workloads.
- Ensure platform components meet enterprise-grade requirements for scalability, latency, multi-region resilience, and cost efficiency
- Stand up LLM runtimes with token/rate governance, caching, and safe tool-use
- Implement RAG at scale: ingestion pipelines, chunking/embedding policies, hybrid search, relevance/risk scoring, and feedback loops
- Build agent orchestration (single & multi-agent) with planning, tool routing, shared/persistent memory, and inter-agent communication
- Integrate tooling and APIs that allow agents to interact with internal systems, retrieve data securely, and take action under strict controls
- Collaborate with research teams to prototype and productionize multi-agent architectures for workflow automation, report generation, and data synthesis.
- Implement cloud-native infrastructure for large-scale model training and serving using Kubernetes, MLflow, Terraform, and AWS-native services
- Automate data and model pipelines for RAG, LLM fine-tuning, and agent orchestration
- Integrate observability tools (Datadog or equivalent) for real-time performance, drift detection and safety monitoring of AI outputs
- Optimize compute and storage architecture to ensure cost-effective scaling of large models and multi-agent workloads
- Partner with security, data governance, SRE, and application teams to productize platform capabilities
- Embed compliance-by-design (HIPAA/CLIA/CAP/FDA/GDPR): PHI/PII handling, encryption, access controls, audit trails
- Implement guardrails: input/output filters, prompt hardening, allow/deny policies for tool execution, policy-as-code in CI/CD
- Bias/explainability hooks and automated evaluations for RAG/LLM/agents; drift and regression detection
- Establish golden paths (templates, examples, docs) and lead platform architecture reviews, code reviews, and design discussions
- Partner with data scientists, AI researchers, and product engineers to deliver reliable and maintainable AI services
- Mentor junior engineers in platform development, distributed systems, and agentic AI infrastructure concepts
- Influence cross-functional roadmaps by partnering with Product and Engineering leadership to align delivery with business needs
Requirements
What you’ll need- 8+ years in software/ML engineering, with 5+ years in ML engineering at scale
- Expertise in building production-grade ML/LLM systems on AWS tech stack (Python, TensorFlow/PyTorch, Spark, MLflow/Kubeflow, vector DBs)
- Proven track record with GenAI/LLMs: fine-tuning, RAG, prompt orchestration, agentic systems, safety guardrails, monitoring, and cost optimization
- Hands-on with RAG systems (embeddings, vector DBs, retrieval policies) and LLM runtime operations (caching, quotas, multi-model routing)
- Experience building agentic AI platforms (LangChain, LlamaIndex, CrewAI, Semantic Kernel, or custom)
- Deep knowledge of data-intensive systems, distributed architectures, and cloud-native development
- Strong grounding in compliance-first engineering in healthcare, biotech, or diagnostics preferred
- Track record building secure, compliant data/AI systems and automating policy checks.
- Excellent ability to influence across teams, mentor engineers, and set technical standards.
Benefits
Comp & perks- Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
- Free testing for employees and their immediate families in addition to fertility care benefits.
- Pregnancy and baby bonding leave
- 401k benefits
- Commuter benefits
- Generous employee referral program
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GenAI servicesvector searchprompt tuningagent orchestrationdocument extractionRAG systemsLLM systemscloud-native developmentdata pipelinesdistributed systems
Soft Skills
mentoringinfluencingcollaborationcommunicationleadershiptechnical standards settingcross-functional partnershipproblem-solvingdesign discussionscode reviews
