FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff Software Engineer – Grafana Databases, Managed Services
Grafana LabsStaff Software Engineer managing 100+ streaming clusters in Grafana's cloud infrastructure. Working with distributed systems and enhancing reliability for mission-critical services.
Tech Stack
Tools & technologiesAWSAzureCassandraCloudDistributed SystemsGoGoogle Cloud PlatformKafkaKubernetesLinuxPostgresTerraform
About the role
Key responsibilities & impact- Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure
- Diagnosing and eliminating cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
- Designing safe upgrade and rollout strategies at scale
- Improving observability, automation, and operational ergonomics
- Partnering closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
- Working directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
- Serving as a primary escalation point and on-call for relevant incidents
- Owning the relationship with all system vendors, including WarpStream Labs and others.
Requirements
What you’ll need- 8+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles.
- Experience with high-throughput streaming systems, analytical or storage backends, or large-scale database infrastructure. Examples of these include Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, or Cassandra.
- Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.).
- Experience leading or driving complex technical efforts, even without formal management responsibilities
- Ability to influence technical direction and align teams around reliability improvements
- Strong understanding of distributed systems failure modes in multi-cloud environments.
- Proficiency in at least one systems-oriented language (Go preferred, but not required).
- Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior.
- Experience participating in blameless incident response and writing high-quality post-incident reviews.
- Clear communicator who can collaborate across teams and work autonomously.
- Intellectually curious, transparent, action-oriented, and kind (this is important!)
Benefits
Comp & perks- Restricted Stock Units (RSUs)
- Health insurance
- 30 days annual leave
- Company-funded usage budget for developer tools
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesGoLinux internalsnetworkingcloud storageperformance behaviorinfrastructure-as-codestreaming systemsdatabase infrastructureincident response
Soft Skills
clear communicatorcollaborationautonomyinfluence technical directionlead complex technical effortsintellectual curiositytransparencyaction-orientedkindness