
Senior AIOps Engineer
EY
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Architect and implement enterprise-grade AIOps solutions to automate incident detection, root cause analysis, and remediation across cloud and hybrid environments.
- Lead the integration of telemetry data from tools like Prometheus, Grafana, AppDynamics, Dynatrace, and Azure Monitor into centralized AIOps platforms for unified observability and intelligent event correlation.
- Design and maintain ML models in Python for anomaly detection, predictive analytics, and operational forecasting, ensuring scalability and accuracy.
- Build and optimize real-time and batch data pipelines using Apache Kafka, Logstash, and Fluentd to process logs, metrics, and traces from distributed systems.
- Collaborate with DevOps, SRE, and platform engineering teams to embed AIOps capabilities into CI/CD workflows and infrastructure-as-code practices.
- Drive automation of operational tasks and remediation workflows using Python, Azure Functions, and orchestration tools to enable self-healing systems.
- Develop dashboards and visualizations using Grafana, Kibana, or Power BI to deliver actionable insights to engineering, operations, and business teams.
- Implement alert noise reduction strategies using ML-based filtering, deduplication, and suppression techniques to improve signal-to-noise ratio.
- Ensure compliance with security, governance, and audit policies, embedding DevSecOps principles and aligning with regulatory standards.
- Lead technical evaluations of AIOps platforms and tools, making recommendations for adoption based on business needs and operational maturity.
- Manage and mentor a team of AIOps engineers, fostering a culture of innovation, continuous learning, and operational excellence.
- Partner with business stakeholders to identify opportunities for cost optimization, risk reduction, and service reliability improvements through AIOps.
- Contribute to strategic roadmaps, budget planning, and vendor assessments, aligning AIOps initiatives with broader IT and business goals.
- Stay current with emerging trends in AIOps, observability, cloud-native operations, and AI-driven automation, and drive their adoption within the organization.
Requirements
- 9+ years of experience in IT operations, DevOps, or SRE, with at least 3 years in AIOps or AI/ML-driven automation.
- Proven experience in technical leadership, team management, and cross-functional collaboration.
- Deep expertise in AIOps platforms: Moogsoft, BigPanda, Splunk ITSI, ServiceNow ITOM, or custom ML-based solutions.
- Strong proficiency in Python, with working knowledge of SQL, PowerShell, and Bash.
- Hands-on experience with Azure, AWS, or GCP, including monitoring and automation services.
- Skilled in CI/CD tools (Azure DevOps, GitHub Actions, Jenkins), IaC (Terraform, Ansible), and Kubernetes.
- Familiarity with observability stacks (ELK, OpenTelemetry, Kafka) and data engineering workflows.
- Excellent communication, stakeholder management, and problem-solving skills.
Benefits
- Competitive salary
- Health insurance
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AIOpsPythonSQLPowerShellBashCI/CDIaCKubernetesML modelsdata pipelines
Soft skills
technical leadershipteam managementcross-functional collaborationcommunicationstakeholder managementproblem-solvinginnovationcontinuous learningoperational excellencecost optimization