DroneUp

SRE – Platform Engineer

DroneUp

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $125,000 - $150,000 per year

Job Level

About the role

  • Broad domain architect for the internal developer platform and all cloud engineering
  • Drive architecture for tooling or in-house software
  • Mentor other platform engineers to drive strong engineering practices
  • Enablement of platform engineering technical capabilities in our internal client teams in software engineering
  • Peer with the senior architects and engineers in software engineering
  • Architecture and engineering focused on GCP environment
  • Architect and oversee GKE cluster operations and workload management
  • Provide feedback to others and participate in peer reviews / pair programming
  • Drive the broad adoption of Test Driven Development through designing, development, and debugging unit and integration tests for new and existing infrastructure and code
  • Continuous curiosity of existing implementations and new technologies and sharing with the team
  • Practice continuous improvement across all job areas and personally / professionally
  • Clearly communicate with platform engineering teams and other stakeholders and provide technical direction while doing so
  • Stay current with platform changes and third-party libraries.
  • Proactively investigate better solutions for current solutions
  • An understanding of Open Telemetry and true observability and the difference between it and monitoring and logging
  • Grow the engineering culture towards a high-performing team
  • Practice the arts of self-service, least privilege and security by default in all solutions
  • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets
  • Lead incident response, including on-call rotations, root cause analysis, and post-mortem reviews
  • Implement and optimize monitoring, alerting, and observability systems for system reliability
  • Collaborate on capacity planning and performance optimization to ensure high availability
  • Other duties as assigned

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering or related field or 8+ years experience as a software engineer
  • Proficiency in kubernetes. Optional: CKA, CKAD
  • Extensive experience in Unix / Linux
  • Polyglot and proficiency in multiple languages (ideally: Golang, NodeJS, Python, HCL and more)
  • Knowledge of multi-cloud environment, including GCP, AWS, and Azure (familiar with at least two of these environments)
  • Experienced in using git in trunk-based development models
  • Experience in use of feature flagging in infrastructure and runtime (k8s)
  • Experience with backend database technology is a plus, including supporting and performance enhancements
  • Advanced experience working with and creating public cloud resources in Terraform or other infrastructure as code tools
  • Experience participating in a 24/7 on-call schedule without supervision and successfully resolving issues without escalation
  • Experience using Open Telemetry for observability as well as other monitoring tools such as datadog, new relic and others
  • Good understanding of networking and routing principles
  • Experience in dockerizing applications and orchestrating them with kubernetes
  • Familiarity with security configuration for web/api services (SSL, Access control)
  • Experience with JIRA or other work tracking systems.
  • Ability to resolve tickets according to priority order and collaborating with the Technical Product Manager to adjust priorities
  • Excellent documentation details, using Confluence or similar tooling – this could include support notes, runbooks, ADRs, etc
  • Familiarity with creating an end to end CI/CD pipeline using various tools with artifact storage
  • Familiarity with use of MacOS as a desktop and predominantly CLI interfaces
  • Experience in a “product mindset” by understanding stakeholder needs, priorities and business value
  • Experience with security compliance frameworks including FedRAMP, NIST, and SOC2
  • Proven experience in SRE practices, including incident management and reliability engineering
  • Familiarity with monitoring tools like Prometheus, Grafana, or Honeycomb for observability
  • Experience with chaos engineering, load testing, or reliability testing frameworks.
Benefits
  • Employees are expected to provide a high level of security to any personal or private information accessed as part of their work, whether at a DroneUp facility or remotely.
  • Participate in security training.
  • Remain sensitive to individual rights to personal privacy.
  • Comply with company policies.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
kubernetesGCPGKEGolangNodeJSPythonHCLTerraformOpen Telemetrydocker
Soft Skills
mentoringcommunicationcollaborationcontinuous improvementcuriosityincident managementproblem-solvingtechnical directionteam cultureprioritization
Certifications
Bachelor's degree in Computer ScienceCKACKAD