Tech Stack
AnsibleAWSCloudGrafanaJavaJavaScriptLinuxLogstashPrometheusPythonTerraform
About the role
- Maintain and deploy monitoring and alerting
- Design, configuration and maintenance of log aggregation solution at a large scale
- Set up and manage ingestion pipelines and data transformations
- Automate tasks and build robust monitoring systems using ELK, Dynatrace, Prometheus, OTEL and Grafana
- Write and maintain documentation for audit and certification requirements
- Troubleshoot, capacity planning, and performance analysis activities
- Research new monitoring requirements and, when needed, write code
- Excellent scripting to automate monitoring tasks
Requirements
- BS/MS in CS/engineering or equivalent, OR 5+ years of experience
- 3+ years of experience working directly with monitoring tools as Admin/SME/Architect (Dynatrace and/or ELK)
- Hands-on experience designing data pipelines using filebeat, Logstash and/or fluentbit/fluentd
- Expert level with either Dynatrace (cloud/on-prem) or Elastic on-prem/cloud
- Fluent in scripting languages like Python and Bash/PowerShell
- Experience in Terraform and Ansible
- Linux OS proficiency; ability to manage infra and apps across multi-cloud environments
- Very good analytical and problem-solving/troubleshooting abilities
- Knowledge of SNMP, TCP dump and tracing
- Knowledge of AIOPS platforms; other scripting experience (JavaScript, Java, PowerShell)