Design and maintain Datadog dashboards for monitoring critical system metrics, including: ➜ Kubernetes metrics. ➜ Application performance metrics. ➜ CI/CD pipeline metrics. ➜ AWS infrastructure metrics.
Lead troubleshooting efforts for metric collection, visualization, and any issues in Datadog.
Analyze Application Performance Monitoring (APM) data to support both technical and business decision-making.
Collaborate cross-functionally with engineering, operations, and product teams to implement performance improvements and resolve reliability challenges.
Develop and maintain infrastructure as code (Terraform preferred) to automate and streamline cloud operations.
Requirements
Experience in Site Reliability Engineering, DevOps, or similar roles, with a focus on cloud-native technologies and systems.
Deep expertise in Datadog, including dashboard creation, metric ingestion, and APM analysis.
Strong hands-on experience with Kubernetes, AWS services, and CI/CD pipelines.
Proficient in monitoring and logging tools such as Fluentbit, Loki, Prometheus, and Grafana.
Solid understanding of infrastructure as code (Terraform preferred).
Excellent troubleshooting skills in distributed systems, especially in cloud-native environments.
Strong communication skills and experience working with external vendors and stakeholders.
Ability to work effectively in a remote, international team environment.
Nice to have: Experience with Tekton, Jenkins, Kafka, Redis, and PostgreSQL (Patroni).
Familiarity with authentication and authorization tools such as Keycloak or Tozny.
Knowledge of artifact and container management platforms like Harbor, ECR, or Minio.
Experience in security management, including authentication and authorization processes.
Benefits
Join a multicultural and inclusive team environment.
Enjoy a supportive atmosphere promoting work-life balance.
Engage in exciting national and international projects.
Hybrid work.
Your career growth is central to our mission.
Our array of career growth programs and diverse professionals are crafted to support you in exploring a world of opportunities.
Training and certifications programs.
Health and life insurance.
Referral program with bonuses for talent recommendations.
Great office locations.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.