FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior AI Platform Engineer
ExpelSenior AI Platform Engineer at Expel, building ML infrastructure and operationalizing ML at scale. Collaborating with team members to enhance AI capabilities and improve systems.
Tech Stack
Tools & technologiesAWSCloudGoGoogle Cloud PlatformJavaScriptPythonPyTorchScikit-LearnSDLCSparkTensorflow
About the role
Key responsibilities & impact- Architect and maintain end-to-end machine learning training pipelines on AWS (SageMaker, EKS, Step Functions) to ensure reliable and reproducible model development and deployment
- Build and maintain infrastructure for production agentic applications using Amazon Bedrock and Bedrock AgentCore — including agent runtimes, memory, secure gateways, and observability at scale
- Contribute to the architectural evolution of our ML platform, including evaluating MLOps tooling and participating in buy vs. build decisions
- Implement AI/ML governance best practices for model versioning, testing, validation, maintenance, and security
- Integrate MLOps best practices with Expel's SDLC, security, and infrastructure standards, working alongside SRE, Platform Engineering, and Security teams
- Drive quality, reliability, and scalability improvements through thoughtful engineering and monitoring
- Partner with data scientists, software engineers, and stakeholders to operationalize ML models reliably and at scale
- Mentor and support junior engineers; foster a culture of engineering excellence
- Create and maintain documentation, internal tooling, and enablement resources so practitioners across Expel can work effectively with ML systems
- Stay current with the MLOps landscape and bring relevant innovations back to the team
Requirements
What you’ll need- 5+ years of relevant software engineering experience with meaningful focus on ML operations and infrastructure
- Degree in Computer Science, Mathematics, Statistics, Engineering, or a related technical field preferred (or a compelling story)
- Strong Python proficiency; familiarity with other languages (Go, JS) is a plus
- Solid experience with CI/CD pipelines, infrastructure-as-code, and containerization for ML workloads
- Hands-on experience with cloud-based ML platforms — AWS (SageMaker, Bedrock, Bedrock AgentCore) strongly preferred; GCP (Vertex AI) experience also valued
- Proven experience operationalizing LLMs and building infrastructure for complex agentic applications — agent orchestration, memory, tool calling, RAG architectures
- Familiarity with ML frameworks including Scikit-Learn, PyTorch, Spark, and TensorFlow
- Working knowledge of continuous retraining, concept drift monitoring, and data drift detection in production
Benefits
Comp & perks- Offer unlimited PTO (that leadership models and encourages)
- Up to 24 weeks of parental leave
- Excellent health benefits
- Pay you a monthly fitness and cell phone stipends — no receipts required
- Support your professional growth with a conference benefit and continuous learning opportunities
- Offer full remote flexibility — work from wherever you do your best work
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learningAWSPythonCI/CDinfrastructure-as-codecontainerizationMLOpsScikit-LearnPyTorchTensorFlow
Soft Skills
mentoringcollaborationengineering excellencedocumentationcommunication