Salary
💰 $130,000 - $170,000 per year
Tech Stack
GrafanaNoSQLPythonSQL
About the role
- Drive System Performance Improvement in the Field — The real world always presents subtle new challenges to intelligent systems. Your primary goal will be to translate field operations experience into better systems, improved learning and enhanced robot skills to optimize top line performance as robot deployments scale.
- Diagnose & Resolve Incidents — Analyse logs, telemetry, SQL/NoSQL data, and container metrics to pinpoint failures; reproduce issues in staging; craft work-arounds or scripted hot-fixes; document in ticketing system (Jira/Zendesk).
- Proactive Monitoring — Tune alerting rules, dashboards, and anomaly-detection models; perform trend analysis to track system performance towards KPI targets.
- Field Presence — Embed on customer site during go-live, major upgrades, or chronic issue hunts. Understand what flies under the radar and how to surface it automatically through the observability stack.
- Knowledge & Automation — Write SOPs, runbooks, and self-service articles; contribute to CLI tools and diagnostic scripts in Python/Bash; mentor TAC on new failure modes.
- Continuous Improvement Projects — Partner with Product and Engineering on feature hardening, performance tuning, and reliability improvement.
- Own and execute NPI software commissioning at customer sites, including tooling setup, system testing, and calibration.
Requirements
- BS or MS in Computer Science, Robotics or a related discipline, or relevant experience.
- Proficient in Python or C++, with experience implementing a large project or product feature.
- Aptitude for Hands-On Work in Physical-AI Environments — must be willing work across the full breadth of the robot stack from sensors to software.
- Data-driven Mindset, ability to instrument, query, and visualize telemetry to inform decisions.
- Familiarity with support tools and observability platforms (e.g., Grafana, Kibana, Foxglove, ArgoCD).
- Strong analytical and debugging skills; experience parsing logs and telemetry data.
- Clear Communication — translate technical findings into customer-friendly language and crisp bug reports.
- Self-starter mindset with a willingness to take ownership in ambiguous environments
- Bias for Action — sense of urgency in production environments; willingness to “own the ticket” until closure.
- Bachelor’s degree in an engineering discipline (e.g., Mechanical, Electrical, Industrial, Robotics) or Computer Science — with demonstrable coding skills and a solid grounding in computer-science fundamentals. Equivalent practical experience is welcome.
- 2+ years in a SaaS, DevOps, Site Reliability, or Application Support role handling customer-facing production systems.