
Staff AI Infrastructure Engineer
SentinelOne
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $170,200 - $234,600 per year
Job Level
About the role
- Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably.
- Automate infrastructure deployment and management using Helm, ArgoCD and Terraform.
- Manage and optimize Kubernetes clusters to support high-performance AI workloads.
- Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins.
- Ensure infrastructure compliance with security standards including FedRAMP and related guidelines.
- Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements.
- Monitor infrastructure health and performance, implementing optimizations proactively.
- Drive infrastructure best practices and mentor team members to foster technical excellence.
Requirements
- A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
- 7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
- Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
- Extensive hands-on experience with Kubernetes for deploying containerized workloads.
- Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
- Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
- Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
- Strong scripting and automation skills using Python, Bash, or similar languages.
- Excellent problem-solving skills, creativity, and self-driven motivation.
- Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
- Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
- Networking concepts and security best practices within cloud infrastructure.
- Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).
Benefits
- Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid Company Holidays
- Paid Sick Time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events, including regular happy hours and team-building events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
infrastructure-as-codeKubernetesCI/CD pipelinesscriptingautomationcloud platformsAI model hostingmonitoringloggingnetworking
Soft Skills
problem-solvingcreativityself-driven motivationmentoringcollaboration
Certifications
Kubernetes certificationAWS certificationAzure certificationGCP certification