AI Infrastructure Engineer

MeshyAI

full-time

Posted on: 2/25/2026

Location Type: Hybrid

Location: Sunnyvale • California • United States

✨ AI Apply

About the role

Own production reliability: availability, latency, error budgets, incident response, postmortems, and follow-ups
Build/maintain observability: metrics, logs, traces, alerting, SLOs/SLIs, dashboards
Improve deployment safety: CI/CD, rollout strategies (canary/blue-green), automated rollback, runbooks
Capacity planning + cost control: GPU/CPU sizing, autoscaling, queue/backpressure management, cost attribution
Security + compliance: secrets management, least privilege, patching, vulnerability response
Disaster recovery + operational readiness: backups, failover plans, game days
Develop and maintain the GPU inference serving stack (APIs, schedulers, workers, batching, caching)

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

GoPythonKubernetesCI/CDobservability toolsGPU inference servingmetricslogstracesincident response

Soft Skills

performance tradeoffsproblem solvinganalytical thinking