dltHub

Senior Platform Engineer

dltHub

full-time

Posted on:

Origin:  • 🇩🇪 Germany

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

BigQueryGrafanaKafkaKubernetesPostgresPrometheusPythonTerraform

About the role

  • Design, provision, and operate our core platform: multi-env/multi-region Kubernetes clusters; networking, security, and identities; storage (object store, tables/iceberg/“DuckLake”), and runtime for Python jobs, notebooks, and services.
  • Build code abstractions: Python CLIs/SDKs, templates, and controllers that abstract infra into simple workflows.
  • Define IaC standards (Terraform, Helm) for everything: clusters, apps, policies, and data runtimes.
  • Design and implement platform control plane: end user secrets management, cost controls, tenant/resource limits, SSO, and scheduling.
  • Ship observability as a product: metrics/logs/traces, golden dashboards, SLIs/SLOs, runbooks, and incident/postmortem practice.
  • Work intensively with the CTO and software engineers in the team on all of the above.

Requirements

  • 5+ years in Platform/SRE/DevInfra roles (or equivalent impact) building and running production systems.
  • Strong Python used for automation/tooling (CLIs, bots, controllers/operators, SDKs).
  • Deep Kubernetes experience (cluster ops, Helm/Kustomize, controllers/operators, container networking).
  • Practical observability (Prometheus/Grafana/OpenTelemetry or similar), performance tuning, and incident response.
  • Running Python data pipelines/apps in production (dlt, dbt, Polars, DuckDB, Iceberg) (nice-to-have).
  • Storage & query engines: Parquet, Iceberg, DuckDB/MotherDuck, BigQuery/Snowflake/Postgres (nice-to-have).
  • Eventing/streaming (Kafka/Pub/Sub), batch schedulers, or serverless Python runtimes (nice-to-have).
  • Security & supply-chain hardening (images, SBOMs, policy-as-code, secret rotation) (nice-to-have).
  • OSS contributions or demo-driven platform work you can show us (nice-to-have).
Articul8 AI

Senior Site Reliability Engineer, SRE

Articul8 AI
Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 14 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaKubernetesNoSQLPrometheusPython+2 more
qode.world

Infrastructure Engineer, Kafka and GenAI

qode.world
Mid · Seniorfull-time🇺🇸 United States
Posted: 25 days agoSource: apply.workable.com
ApacheAWSAzureCloudDockerGoGoogle Cloud PlatformGrafanaJenkinsKafkaKubernetesPrometheus+4 more
Oscilar

Senior Data Engineer

Oscilar
Seniorfull-time🇺🇸 United States
Posted: 17 days agoSource: jobs.ashbyhq.com
AirflowCloudDynamoDBETLGrafanaJavaKafkaPostgresPrometheusPythonRedisSQL+1 more
Coralogix

DevOps Engineer

Coralogix
Mid · Seniorfull-time🇮🇱 Israel
Posted: 14 days agoSource: www.comeet.com
CloudGoGrafanaKubernetesPrometheusPythonTypeScript
InStride

Principal Site Reliability Engineer, SRE

InStride
Leadfull-time$165k–$185k / yearArizona, California, Colorado · 🇺🇸 United States
Posted: 5 days agoSource: boards.greenhouse.io
AWSCloudGoGrafanaKubernetesPrometheusPythonTerraformTypeScript