Lead the architectural design and technical roadmap for scalable, high-performance data processing pipelines capable of handling petabyte-scale telemetry data (logs, metrics, traces)
Drive the development and optimization of ML-driven data routing, filtering, and transformation engines to reduce customer data volumes by 80%+ while preserving critical insights
Architect and implement real-time analytics and anomaly detection systems using advanced machine learning techniques and large language models
Design cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog, New Relic)
Establish robust monitoring, alerting, and observability solutions for distributed systems operating at enterprise scale
Lead cross-functional technical initiatives, collaborating with Product, Data Science, and DevOps teams to translate strategic vision into technical solutions
Drive system performance, cost efficiency, and reliability optimization through advanced profiling, testing, and infrastructure design
Provide technical leadership and mentorship to senior and junior engineers, establishing engineering best practices and culture
Evaluate and introduce emerging technologies in AI/ML, data engineering, and observability to maintain competitive advantage
Participate in technical decision-making forums and contribute to company-wide engineering standards and practices
Requirements
10+ years of software engineering experience with a focus on distributed systems, data engineering, or ML infrastructure in high-growth SaaS environments
Expert-level proficiency in Go, Rust, or Java with a deep understanding of system design patterns, software architecture principles, and performance optimization
Extensive experience with cloud platforms (AWS, GCP, Azure) and container orchestration technologies (Kubernetes, Docker) at enterprise scale
Proven track record leading and scaling data pipelines using technologies like Apache Kafka, Apache Spark, Apache Flink, or similar streaming frameworks
Deep expertise in database technologies, including both SQL (PostgreSQL, MySQL) and NoSQL (MongoDB, Cassandra, Redis) systems with experience in data modeling and optimization
Advanced experience with machine learning frameworks (TensorFlow, PyTorch, scikit-learn) and MLOps practices for production ML systems at scale
Expert knowledge of observability and monitoring tools and practices, with experience architecting solutions using Prometheus, Grafana, ELK stack, and similar platforms
Comprehensive understanding of data formats, protocols, and standards used in enterprise observability (OpenTelemetry, StatsD, syslog, JSON, Parquet)
Extensive experience with Infrastructure as Code tools (Terraform, CloudFormation) and CI/CD pipelines for automated deployment and testing
Strong leadership and technical communication skills with experience driving technical decisions across multiple teams and stakeholders
Track record of mentoring engineers and establishing technical standards and best practices in complex engineering organizations
Experience with technical strategy and roadmap planning for large-scale distributed systems
Bachelor's degree in Computer Science, Engineering, or related field; advanced degree preferred
Benefits
Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid company holidays
Paid sick time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.