Upgrade, Inc.

Principal Infrastructure Performance Engineer

Upgrade, Inc.

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Lead

Tech Stack

AWSCloudGoGrafanaJavaKubernetesLinuxMicroservicesPrometheusPythonSQLTerraform

About the role

  • Build a resilient, secure, and efficient cloud based observability platform.
  • Monitor and troubleshoot platform issues, including finding solutions to reduce known issues.
  • Build and scale the observability infrastructure to meet rapidly increasing demand.
  • Develop and improve operational practices and procedures.
  • Sample projects:
  • Improve database monitoring: develop custom prometheus exporters in Go for use cases that go beyond what is possible with SQL exporter. Create Grafana dashboards and alerts for these new metrics.
  • MCP servers for observability: deploy MCP server to integrate our observability stack with our LLM tools.
  • Our Tech Stack: Monitoring: VictoriaMetrics, Grafana, Prometheus, OpenTelemetry, Honeycomb, Sumologic.
  • Infrastructure as code: Terraform.
  • CD: GitOps, ArgoCD, ArgoRollouts.
  • CI: Tekton.
  • Scripting: Bash.
  • Programming: Golang (preferred).
  • AWS: EKS, Cloudwatch, S3, DynamodDB, RDS, SNS, SQS, Lambda.

Requirements

  • 8+ years of relevant production-level experience.
  • Experience with VictoriaMetrics.
  • Experience with Sumologic.
  • Experience with tracing tools (e.g. OpenTelemetry, Honeycomb, Tempo).
  • Experience with profiling tools (e.g. Pyroscope).
  • Knowledge of cloud monitoring, logging and cost management tools.
  • Programming/scripting knowledge (Go, Java, or Python) and understanding of JVM concepts.
  • In-depth knowledge of AWS services, hands-on experience in AWS provisioning using terraform.
  • Experience with containerized applications and Kubernetes / EKS. Creating and updating / maintaining Helm charts.
  • Understanding of microservices architecture and debugging/investigation techniques.
  • Strong understanding of systems, networking and troubleshooting techniques.
  • Experience in automated build pipeline, continuous integration and continuous deployment.
  • Ability to operate in an agile, entrepreneurial start-up environment.
  • Experience with running Linux in production.