Salary
💰 $130,295 - $260,590 per year
Tech Stack
AnsibleAzureCloudGoGrafanaITSMJenkinsKubernetesPrometheusPythonSDLCServiceNowSplunkTerraform
About the role
- Lead Automation-First Strategy: Define and execute a roadmap to automate observability deployment, configuration, and lifecycle management using infrastructure-as-code and observability-as-code principles.
- Streamline Operational Workflows: Eliminate manual processes by engineering intelligent workflows, including self-service provisioning, automated health checks, and event-driven incident remediation.
- Orchestrate Telemetry & Platform Modernization: Automate the ingestion, normalization, and correlation of telemetry data (metrics, logs, traces) across hybrid and multi-cloud environments. Transition legacy observability tooling to modern platforms through automated migration frameworks.
- Engineer Self-Healing Capabilities: Design and implement automated remediation and recovery actions for known failure scenarios using tools such as Ansible, SCCM, PowerShell, and ServiceNow workflows.
- Enable CI/CD and GitOps for Observability: Integrate observability deployment and configuration into CI/CD pipelines. Leverage Git-based workflows to support consistent, version-controlled platform management.
- Support Custom Solutions: Maintain and enhance homegrown observability tools and integrations, particularly those built in C# or other object oriented applications, ensuring they scale with business needs and integrate seamlessly into the automated ecosystem.
- Promote Observability as Code: Drive adoption of code-first observability strategies through reusable templates, automation libraries, and documentation. Embed instrumentation and alerting directly into the SDLC.
- Agile Collaboration: Utilize tools like Jira, Jira Align, and ServiceNow to manage sprints, backlogs, and releases. Track bugs, tasks and user stories to illustrate progress.
- Mentor and Influence Across Teams: Provide technical leadership to engineers and operational partners. Evangelize automation best practices and observability automation patterns across the organization.
Requirements
- 7+ years of experience in automation engineering, observability, or SRE roles
- Strong expertise with automation tools such as Ansible, SCCM, Jenkins, Git, and scripting languages (Python, Bash, PowerShell)
- Demonstrate success engineering automation for telemetry collection, alerting, and event handling at scale
- 3+ years of experience with Observability / Log tools (example: AppDynamics, Splunk, etc)
- Hands-on experience supporting or extending homegrown tools in C# or similar object-oriented languages
- Deep understanding of infrastructure-as-code and GitOps principles in enterprise environments
- Strong knowledge of enterprise IT environments, including cloud, on-premises, and hybrid infrastructure
- Experience working in Jira, Jira Align, ServiceNow, and other collaboration tools.
- Excellent problem-solving skills and the ability to lead in high-urgency, high-complexity environments