
Systems Architect – AI/ML Infrastructure
Deepgram
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $160,000 - $220,000 per year
Tech Stack
About the role
- Define and drive the end-to-end infrastructure architecture for Deepgram's AI/ML workloads across production inference and research training
- Design multi-cloud and hybrid infrastructure strategies that balance performance, reliability, cost, and vendor flexibility
- Architect compute orchestration systems that efficiently schedule and manage GPU and CPU workloads across heterogeneous infrastructure
- Design storage architectures that handle the massive datasets required for speech and audio ML -- from high-throughput training data pipelines to low-latency model serving
- Lead capacity planning across all infrastructure dimensions, modeling growth and ensuring Deepgram can scale ahead of demand
- Drive cost optimization and FinOps practices, identifying opportunities to reduce infrastructure spend without compromising performance or reliability
- Design burstable, elastic training infrastructure that can scale up for large training runs and scale down to minimize idle cost
- Architect research compute infrastructure that gives ML teams the resources they need while maintaining operational efficiency
- Establish architectural standards, design review processes, and technical documentation practices for infrastructure decisions
- Collaborate with engineering leadership to align infrastructure strategy with product roadmap and business objectives
- Evaluate emerging hardware, cloud services, and infrastructure technologies for potential adoption
Requirements
- 7+ years of experience in infrastructure engineering, systems architecture, or a senior technical role focused on large-scale infrastructure
- Proven experience designing multi-cloud architectures spanning AWS and at least one other major cloud provider or on-premises environment
- Deep expertise in storage system design -- block, object, and file storage, including performance tuning for large-scale data workloads
- Strong experience with compute orchestration using Kubernetes, and an understanding of how to schedule diverse workloads efficiently
- Hands-on experience with GPU infrastructure -- procurement considerations, cluster design, driver and runtime management
- Track record of capacity planning and infrastructure scaling for high-growth environments
- Ability to communicate complex architectural decisions clearly to both technical and non-technical stakeholders
- Strong understanding of networking fundamentals as they relate to infrastructure architecture
Benefits
- Medical, dental, vision benefits
- Annual wellness stipend
- Mental health support
- Life, STD, LTD Income Insurance Plans
- Unlimited PTO
- Generous paid parental leave
- Flexible schedule
- 12 Paid US company holidays
- Quarterly personal productivity stipend
- One-time stipend for home office upgrades
- 401(k) plan with company match
- Tax Savings Programs
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
infrastructure architecturemulti-cloud architecturestorage system designcompute orchestrationKubernetesGPU infrastructurecapacity planningperformance tuningdata pipelinesmodel serving
Soft Skills
communicationleadershipcollaborationtechnical documentationstrategic alignment