Textlayer

Senior DevOps Engineer

Textlayer

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇨🇦 Canada

Visit company website
AI Apply
Apply

Salary

💰 CA$200,000 - CA$220,000 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudDockerElasticSearchGoogle Cloud PlatformGrafanaKubernetesLogstashPrometheusPythonTerraform

About the role

  • Design and maintain OpenTelemetry-based observability infrastructure for distributed AI systems and LLM applications
  • Build and scale ELK stack deployments (Elasticsearch, Logstash, Kibana) for log aggregation, search, and visualization of AI application data
  • Implement comprehensive tracing and monitoring solutions for LLM inference, RAG pipelines, and AI Agent workflows
  • Develop and maintain data ingestion pipelines for processing high-volume telemetry data from AI applications
  • Configure and optimize OpenSearch clusters for real-time analytics and trace reconstruction of conversational flows
  • Deploy and manage LLM observability platforms like Langfuse, OpenLLMetry, and custom monitoring solutions
  • Implement Infrastructure as Code (Terraform, CloudFormation) for reproducible observability and application stack deployments
  • Build automated alerting and incident response systems for AI application performance and reliability
  • Collaborate with engineering teams to instrument AI applications with proper telemetry and observability hooks
  • Optimize data retention policies, indexing strategies, and query performance for large-scale observability data

Requirements

  • 4+ years of DevOps/Infrastructure engineering experience with focus on observability and monitoring
  • Expert-level experience with OpenTelemetry implementation, configuration, and custom instrumentation
  • Production experience with ELK stack (Elasticsearch, Logstash, Kibana) including cluster management and optimization
  • Strong knowledge of distributed tracing, metrics collection, and log aggregation architectures
  • Experience with container orchestration (Kubernetes, Docker) and cloud infrastructure (AWS/GCP/Azure)
  • Proficiency with Infrastructure as Code tools (Terraform, Ansible, CloudFormation)
  • Experience building high-throughput data ingestion pipelines and real-time analytics systems
  • Strong scripting skills (Python, Bash/Sh) for automation and tooling
  • Knowledge of observability best practices, SLI/SLO definitions, and incident response
  • Experience with monitoring tools like Prometheus, Grafana, or DataDog
Benefits
  • Competitive salary
  • Flexible work hours
  • Professional development opportunities
  • Remote work options

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
OpenTelemetryELK stackElasticsearchLogstashKibanaInfrastructure as CodeTerraformCloudFormationPythonBash
Soft skills
collaborationautomationincident response
commonsku

Senior DevOps Engineer

commonsku
Seniorfull-time$135k–$155k / year🇨🇦 Canada
Posted: 2 days agoSource: commonsku.bamboohr.com
Amazon RedshiftAWSCloudDockerLinuxTerraform
1Password

Developer Intern – Release Engineering

1Password
Entryinternship🇨🇦 Canada
Posted: 3 days agoSource: jobs.ashbyhq.com
RustTypeScript
GitLab

Senior Site Reliability Engineer, Database Operations

GitLab
Seniorfull-time🇨🇦 Canada
Posted: 3 days agoSource: boards.greenhouse.io
AnsibleChefDistributed SystemsGoKubernetesPostgresPuppetRubySQLTerraform