SentinelOne

Staff AI Infrastructure Engineer

SentinelOne

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $170,200 - $234,600 per year

Job Level

About the role

  • Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably.
  • Automate infrastructure deployment and management using Helm, ArgoCD and Terraform.
  • Manage and optimize Kubernetes clusters to support high-performance AI workloads.
  • Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins.
  • Ensure infrastructure compliance with security standards including FedRAMP and related guidelines.
  • Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements.
  • Monitor infrastructure health and performance, implementing optimizations proactively.
  • Drive infrastructure best practices and mentor team members to foster technical excellence.

Requirements

  • A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
  • 7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
  • Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
  • Extensive hands-on experience with Kubernetes for deploying containerized workloads.
  • Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
  • Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
  • Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
  • Strong scripting and automation skills using Python, Bash, or similar languages.
  • Excellent problem-solving skills, creativity, and self-driven motivation.
  • Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
  • Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
  • Networking concepts and security best practices within cloud infrastructure.
  • Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).
Benefits
  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Industry-leading gender-neutral parental leave
  • Paid Company Holidays
  • Paid Sick Time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events, including regular happy hours and team-building events
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
infrastructure-as-codeKubernetesCI/CD pipelinesscriptingautomationcloud platformsAI model hostingmonitoringloggingnetworking
Soft Skills
problem-solvingcreativityself-driven motivationmentoringcollaboration
Certifications
Kubernetes certificationAWS certificationAzure certificationGCP certification