Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
EarnIn

Staff Platform Engineer

EarnIn

Staff Platform Engineer leading AI-driven workflows at EarnIn for cloud infrastructure. Mentoring engineers and shaping a developer self-service platform with a focus on operational efficiency.

Posted 6/4/2026full-timeRemote • 🇲🇽 MexicoLeadWebsite

Tech Stack

Tools & technologies
AnsibleAWSCloudDistributed SystemsFluxGoKubernetesPythonTerraform

About the role

Key responsibilities & impact
  • Design foundational patterns and guardrails for how EarnIn builds, evaluates, monitors, and deploys AI agents in production.
  • Own agent governance, including model selection, evaluation frameworks, safety guidelines, and production observability.
  • Establish infrastructure-as-code best practices for agentic systems, ensuring prompts, tools, and evaluation criteria are versioned, reviewed, and tested like critical components.
  • Serve as architect in agentic cloud infrastructure, establishing best practices for production AI agents.
  • Mentor senior engineers in advanced agentic patterns, LLM integration, and production prompt engineering.
  • Lead cross-functional initiatives with engineering, product, security, and business teams to align agentic AI adoption with company objectives.
  • Oversee large-scale, high-availability distributed systems on AWS, identifying and solving critical performance, scalability, and stability challenges.
  • Use AI-driven observability and anomaly detection to anticipate failures.
  • Lead the evolution of infrastructure-as-code and automation standards, incorporating agentic pattern recognition and automated remediation into operations.
  • Shape the evolution of our developer control plane (Cortex) as an AI-augmented self-service platform where engineers interact with intelligent assistants.
  • Drive AI-powered golden paths that encode platform standards, security policies, and best practices.
  • Act as liaison between cloud operations, AI infrastructure, and business stakeholders.
  • Develop documentation on agentic architecture, best practices, and operational procedures.
  • Participate in and lead on-call rotations, using post-mortems as feedback loops for improving system reliability and agentic automation.

Requirements

What you’ll need
  • Bachelor's or Master's degree in Computer Science, Engineering, or related field.
  • 7+ years of experience in cloud infrastructure, managing large-scale, high-availability, customer-facing distributed systems.
  • Proven experience mentoring senior engineers and leading company-wide platform initiatives across multiple teams and functions.
  • Demonstrated experience architecting and scaling AI-driven systems in production, designing multi-step agentic workflows that autonomously perform complex operational tasks.
  • Track record of eliminating high-friction operational workflows through agentic AI, with measurable reduction in toil and increased platform leverage (e.g., LLM-powered incident diagnosis, intelligent CI/CD with test selection and deployment risk scoring, self-service assistants).
  • Mastery of AWS (EKS, Lambda, Bedrock, etc.) and deep expertise in containerized and serverless architectures.
  • Strong expertise in Kubernetes at scale and ability to guide implementation of complex, resilient solutions.
  • Deep knowledge of infrastructure-as-code tools (Terraform, Ansible) and ability to lead initiatives incorporating both traditional IaC and agentic automation.
  • Mastery of Datadog and advanced observability, driving metrics-driven decisions and agentic automation. Experience building AI-driven alerting and root-cause analysis systems is a plus.
  • Strong adherence to security, privacy, and compliance best practices, with the ability to lead governance for production AI systems (model safety, prompt injection prevention, data isolation).
  • Experience with LLM orchestration frameworks (LangChain, LlamaIndex, CrewAI, or custom agentic architectures) and production prompt engineering at scale.
  • Strong coding expertise in Python and/or Go, with the ability to guide teams in treating infrastructure and agentic systems as software.
  • Proven ability to drive cross-functional initiatives across engineering, product, security, and business, translating between technical depth and business impact.
  • Experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, ChatGPT, or similar tools) as part of your software development workflow?
  • Experience with service mesh (Linkerd, Istio) and traffic management at scale is a plus.
  • Proficiency with GitOps (Argo CD, Flux CD) and CI/CD orchestration (GitHub Actions, Argo Workflows) is a plus.
  • Experience with MLOps or LLMOps concepts (model versioning, evaluation frameworks, production monitoring for AI systems) is a plus.
  • Familiarity with security frameworks relevant to AI systems (e.g., guardrails, audit logging, and data governance for LLMs) is a plus.

Benefits

Comp & perks
  • healthcare
  • internet and cell phone reimbursement
  • learning and development stipend
  • potential opportunities to travel to our Mountain View headquarters

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
cloud infrastructureAI-driven systemsinfrastructure-as-codeAWSKubernetesTerraformAnsiblePythonGoMLOps
Soft Skills
mentoringleadershipcross-functional collaborationcommunicationproblem-solvingdocumentationgovernanceinitiative drivingfeedback incorporationstakeholder liaison
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Engineering