AirGarage

Software Engineer, IoT Reliability

AirGarage

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $160,000 - $190,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AWSDjangoDockerGoogle Cloud PlatformGrafanaIoTLinuxPostgresPrometheusPythonRabbitMQSQL

About the role

  • We’re looking for a Software Engineer to own the reliability, health, and observability of our nationwide IoT device fleet. In this role, you will work with embedded systems, backend infrastructure, and site reliability engineering. You’ll design and build the tools, monitoring pipelines, and automation that keep hundreds of devices online and performing reliably across our locations.
  • You’ll architect and ship production code, build internal platforms for fleet monitoring and diagnostics, and apply strong debugging skills when issues arise. From log analysis and remote device tuning to backend systems that automate health checks and calibration, you’ll have end-to-end responsibility for improving the stability and scalability of our hardware fleet.
  • Our stack:
  • Devices: Embedded Linux (Debian, Yocto), Python, C++
  • Observability: DataDog, Hex, SQL
  • Data: Postgres, Snowflake
  • Infra: AWS, GCP, Docker, RabbitMQ, Github Actions
  • Backend: Python, Django, DRF
  • WHAT YOU WILL DO 🚀
  • Design and implement systems to monitor, diagnose, and improve IoT device health at scale.
  • Build internal tools and scripts for device setup, fleet observability, QA automation, and ongoing monitoring.
  • Contribute to backend services that support device integration, calibration, and reliability improvements.
  • Investigate and resolve fleet-wide issues by analyzing metrics, logs, and telemetry; minimize downtime through remote debugging and fixes.
  • Test and tune hardware products during or post-installation (e.g. camera exposure, detection modes, connectivity parameters) to ensure optimal performance.
  • Conduct periodic fleet-wide health assessments to detect degradation, systemic issues, or underperforming devices, and recommend firmware or deployment improvements.
  • Serve as the primary internal contact for hardware health, providing regular reports to operations on per-site hardware performance, device uptime, and systemic issues affecting service quality.
  • Collaborate with operations and hardware teams to surface recurring pain points and propose architectural or process improvements that drive greater reliability and scalability.
  • Author and maintain troubleshooting guides, repair instructions, and internal playbooks that enable consistency and efficiency across deployments.
  • Travel occasionally (~20%, otherwise fully remote) for QA, deployments, and on-site debugging when remote fixes aren’t possible.
  • WHAT YOU NEED 🧠
  • 3+ years of professional software engineering experience.
  • Strong proficiency in Python and SQL, with experience shipping production-quality code.
  • C++ background is a plus.
  • Experience managing distributed Linux-based hardware appliances or IoT fleets.
  • Familiarity with observability and monitoring tools (e.g., DataDog, OpenTelemetry, Prometheus, Grafana) and building internal tooling for device health and alerting.
  • Track record building internal tooling, monitoring, or reliability platforms.
  • Hands-on experience with Linux systems (dmesg, journalctl, ip, systemd, etc.) and debugging distributed hardware/software environments.
  • Background in cellular (4G LTE, CAT 4, CAT 1bis, 5G RedCap), WiFi, WiFi HaLow, or other wireless connectivity.
  • Excellent written and verbal communication skills; able to translate complex technical findings into clear reports and playbooks.
  • Self-starter who thrives in a fast-paced, ownership-driven environment.
  • Willingness to travel to locations for troubleshooting (roughly 20% travel, otherwise fully remote).
  • THE UPSIDE 📈
  • Equity: Have a stake in the business that you’re helping to build and grow.
  • Work remotely: Live and work wherever you like! We believe in folks working where they are happiest and most productive. We currently hire teammates that are located anywhere within North America.
  • Health insurance: We offer health insurance and currently cover 85% of the cost of medical plans for the primary employee and 50% of the cost of plans for dependents.
  • Home office setup: Get a laptop + additional equipment needed to set you up for success.
  • Time to recharge: We have an unlimited PTO policy with a minimum requirement of 10 days per year.
  • 401k: Make financial planning right for you with a 401k retirement savings program.
  • Team Off-sites: ~2 times per year our team comes together for a full week in places like Tahoe, Puerto Vallarta, San Diego, and Austin.
  • BookGarage: Our team loves to learn and grow together so join us for our optional recurring book club.
  • Room to grow: Our team will be orders of magnitude larger within a few years, as a part of our foundational team you'll have opportunities to grow with us.
  • Transform our cities: The opportunity to change the way that the world thinks about real estate use in our cities.
  • Work with a diverse team: At AirGarage, we've always been committed to building a thriving team that represents the communities we serve. Our team is currently 40% female and 30%+ from underrepresented communities.

Requirements

  • 3+ years of professional software engineering experience.
  • Strong proficiency in Python and SQL, with experience shipping production-quality code.
  • C++ background is a plus.
  • Experience managing distributed Linux-based hardware appliances or IoT fleets.
  • Familiarity with observability and monitoring tools (e.g., DataDog, OpenTelemetry, Prometheus, Grafana) and building internal tooling for device health and alerting.
  • Track record building internal tooling, monitoring, or reliability platforms.
  • Hands-on experience with Linux systems (dmesg, journalctl, ip, systemd, etc.) and debugging distributed hardware/software environments.
  • Background in cellular (4G LTE, CAT 4, CAT 1bis, 5G RedCap), WiFi, WiFi HaLow, or other wireless connectivity.
  • Excellent written and verbal communication skills; able to translate complex technical findings into clear reports and playbooks.
  • Self-starter who thrives in a fast-paced, ownership-driven environment.
  • Willingness to travel to locations for troubleshooting (roughly 20% travel, otherwise fully remote).