Beyond

Principal Engineer - Data & ML

Beyond

contract

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Lead

Tech Stack

ApacheCloudDistributed SystemsDockerETLGoGoogle Cloud PlatformGraphQLJavaScriptKubernetesNode.jsNoSQLPythonRedisScalaSQL

About the role

  • Beyond is a technology consultancy helping organizations thrive in a rapidly changing world.
  • We build, modernize, scale, and operationalize technology, creating Cloud and AI solutions to unlock productivity and drive customer growth.
  • Role Overview: We're looking for a Principal Engineer to shape the next generation of our data and ML capabilities, focusing on data quality, enrichment, and the intelligent linking of products and information.
  • As a Principal Engineer at Beyond, you’ll:
  • Lead the architecture and evolution of scalable, high-performance data pipelines and ML systems, focusing on data ingestion, transformation, quality checks, and enrichment.
  • Drive cross-functional initiatives to integrate modern Machine Learning and AI technologies (including semantic understanding, natural language processing, and potentially large language models) to automate data quality, link canonical products, and create intelligent data enrichment solutions.
  • Define strategies to enhance the performance, reliability, and observability of data and ML services, ensuring robust, high-quality data outputs.
  • Design and implement frameworks for evaluating data quality and the effectiveness of ML models through both offline metrics and online validation.
  • Champion engineering best practices and mentor engineers across teams, raising the bar for code quality, data governance, and ML system design.
  • Shape long-term technical direction by staying ahead of trends in AI, ML, data engineering, and distributed systems and bringing these innovations into production within the Knowledge domain.
  • Things that will make you stand out:
  • Degree in Computer Science, Engineering, Machine Learning, or a related technical field.
  • 8+ years of experience designing and leading the development of large-scale distributed data and/or ML backend systems.
  • Hands-on experience with ETL pipeline design and optimization for complex data sets.
  • Deep familiarity with technologies such as Apache Beam, Pub/Sub, Redis, and other large-scale data processing frameworks.
  • Expertise in backend development with Python and Scala; knowledge of Node.js or Golang is a plus.
  • Proficient with both SQL and NoSQL databases, and experience with data warehousing solutions.
  • Demonstrated experience building robust APIs (REST, GraphQL) and operating in modern cloud environments (GCP preferred), using Kubernetes, Docker, CI/CD, and observability tools.
  • Proven ability to lead and influence engineering direction across teams and functions, particularly in a data-centric and ML-driven environment.
  • Strong communication skills and the ability to align diverse technical stakeholders around a cohesive vision for data quality and knowledge extraction.
  • Specialization in ML areas such as semantic understanding, NLP, data classification, or entity resolution.
  • Experience integrating LLMs, developing custom model architectures, or deploying ML solutions in production for data enrichment and automation.
  • Knowledge of end-to-end ML system design, including experimentation workflows and model lifecycle management (MLOps).
  • Certifications in Google Cloud ML or Data Engineering.
  • Nice to Have
  • Specialization in ML areas such as semantic understanding, NLP, data classification, or entity resolution.
  • Experience integrating LLMs, developing custom model architectures, or deploying ML solutions in production for data enrichment and automation.
  • Knowledge of end-to-end ML system design, including experimentation workflows and model lifecycle management (MLOps).
  • Certifications in Google Cloud ML or Data Engineering.

Requirements

  • Degree in Computer Science, Engineering, Machine Learning, or a related technical field.
  • 8+ years of experience designing and leading the development of large-scale distributed data and/or ML backend systems.
  • Hands-on experience with ETL pipeline design and optimization for complex data sets.
  • Deep familiarity with technologies such as Apache Beam, Pub/Sub, Redis, and other large-scale data processing frameworks.
  • Expertise in backend development with Python and Scala; knowledge of Node.js or Golang is a plus.
  • Proficient with both SQL and NoSQL databases, and experience with data warehousing solutions.
  • Demonstrated experience building robust APIs (REST, GraphQL) and operating in modern cloud environments (GCP preferred), using Kubernetes, Docker, CI/CD, and observability tools.
  • Proven ability to lead and influence engineering direction across teams and functions, particularly in a data-centric and ML-driven environment.
  • Strong communication skills and the ability to align diverse technical stakeholders around a cohesive vision for data quality and knowledge extraction.
  • Specialization in ML areas such as semantic understanding, NLP, data classification, or entity resolution.
  • Experience integrating LLMs, developing custom model architectures, or deploying ML solutions in production for data enrichment and automation.
  • Knowledge of end-to-end ML system design, including experimentation workflows and model lifecycle management (MLOps).
  • Certifications in Google Cloud ML or Data Engineering.