DevOps Engineer

• Design, Build, and Maintain Core Infrastructure: Architect and implement scalable, highly available, and secure infrastructure on cloud platforms (GCP, AWS, Azure) to support our AI-driven applications and services.\n• Automate Everything: Develop and maintain automation tools and frameworks to eliminate manual effort in deployment, configuration, and management of our production environment.\n• Ensure System Reliability and Performance: Establish and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our production systems. Proactively identify and resolve performance bottlenecks and availability issues.\n• Manage ML Infrastructure and Pipelines: Collaborate with ML engineers to build and maintain robust CI/CD pipelines for machine learning models, ensuring seamless training, deployment, and monitoring.\n• Incident Response and Post-Mortems: Lead incident response efforts to minimize downtime and conduct thorough post-incident reviews to identify root causes and implement preventative measures.\n• Implement and Enhance Observability: Deploy and manage comprehensive monitoring, logging, and tracing solutions (e.g., Prometheus, Grafana, ELK stack) to provide deep visibility into system health.\n• Capacity Planning and Cost Optimization: Forecast infrastructure needs and optimize resource utilization to ensure our platform can scale efficiently and cost-effectively.\n• Foster a Culture of Reliability: Champion SRE best practices across the engineering organization and mentor team members on reliability, performance, and scalability.

Senior Site Reliability Engineer

Senior Site Reliability Engineer – Azure Cloud

Senior DevSecOps Engineer

Junior DevSecOps Engineer

Site Reliability Engineer III – IntelliScript

Principal Engineer – Ops/SRE