FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesCloudKubernetesLinuxPython
About the role
Key responsibilities & impact- Design and improve the platform systems that support model training, evaluation, and production serving.
- Build infrastructure and tooling that make ML workloads more reliable, scalable, and cost-efficient.
- Develop internal tools and workflows that are easy to operate both by humans and by agents.
- Work on the architecture behind how models are deployed, served, and operated across research and product environments.
- Improve how we schedule, monitor, and debug workloads running on GPUs and cloud infrastructure.
- Develop internal tools and abstractions and agentic systems that reduce operational overhead for researchers and engineers.
- Drive improvements across observability, automation, reliability, and developer experience.
- Collaborate closely with researchers and product engineers to understand pain points and turn them into robust platform capabilities.
- Contribute to technical direction and make pragmatic architectural tradeoffs as the platform grows.
Requirements
What you’ll need- Strong experience building or operating production systems with a focus on reliability, scalability, and maintainability.
- A systems mindset: you naturally think in terms of bottlenecks, failure modes, interfaces, resource usage, and long-term operability.
- Solid hands-on experience with cloud infrastructure, Linux, and infrastructure automation.
- Experience with Kubernetes and operating distributed workloads in production.
- Strong coding skills, ideally in Python or similar languages used for backend systems and tooling.
- Strong judgment around where automation adds leverage, and where human control and reliability matter most.
- Experience building internal platforms, developer tooling, or infrastructure abstractions used by other engineers.
- Comfort working in ambiguous environments and taking ownership of open-ended technical problems.
- A pragmatic approach: you care about solving the right problem well, not over-engineering.
Benefits
Comp & perks- Health insurance
- Retirement plans
- Flexible work arrangements
- Professional development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonKubernetescloud infrastructureLinuxinfrastructure automationproduction systemsmodel trainingmodel evaluationmodel deploymentobservability
Soft Skills
systems mindsetproblem-solvingcollaborationownershipjudgmentadaptabilitypragmatismreliability focusscalability focusmaintainability focus
