
Staff Site Reliability Engineer – Cloud
Trimble Inc.
full-time
Posted on:
Location Type: Remote
Location: United Kingdom
Visit company websiteExplore more
Job Level
About the role
- Lead a global "OTel First" strategy, implementing OpenTelemetry at scale across a diverse technological landscape.
- Spearhead the development of automation scripts and Infrastructure as Code using Terraform to ensure seamless, reproducible platform delivery.
- Optimize platform performance and cost-efficiency, ensuring our observability tools scale economically as our data grows.
- Collaborate with engineering teams to embed reliability and security standards into new features from the ground up.
- Drive root cause analysis and problem management to proactively prevent incidents and improve the customer experience.
Requirements
- Hands-on experience with the OpenTelemetry Collector, APIs, and SDKs.
- Extensive experience with observability tools like NewRelic, Datadog, or Splunk.
- Strong proficiency in Infrastructure as Code (Terraform, Ansible) and cloud platforms (AWS, GCP, or Azure).
- Deep understanding of containerization and orchestration using Docker and Kubernetes.
- Advanced coding skills in Python, Go, or Java for building robust automation and monitoring tools.
Benefits
- Flexible working hours
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
OpenTelemetryTerraformAnsibleAWSGCPAzureDockerKubernetesPythonGo
Soft Skills
leadershipcollaborationproblem managementroot cause analysiscustomer experience improvement