Sumble

Data Scientist / Machine Learning Engineer

Sumble

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

CloudGoogle Cloud PlatformPostgresPythonPyTorchReactTypeScript

About the role

  • About Us: Sumble's current focus is on acquiring, cleaning, and joining company-related data that integrates seamlessly with customers' data, enhancing go-to-market operations. Our long-term vision is to become the primary destination for accessing high-quality external data.
  • Our Team: We are a dedicated team of 9 engineers with experience at companies such as Google, Meta, Kaggle and Stack Overflow.
  • What you'll do: Finetuning small language models; Improving the quality of existing data using scalable approaches. Examples include: making sure URLs are associated the right company, we have the correct HQ address, we have mapped parents-subsidiary using techniques like LLM validation, SERP, and triangulating across sources.; Adding new signals: this usually involves scrubbing, matching and normalizing new signals and matching to our existing ontology; Pushing solutions into production environments, which may involve touching data pipelines and/or backend systems.
  • More about Sumble: Our Tech Stack: PyTorch, Huggingface, Gemma models, LORA, VLLM, Skypilot, Marimo; Languages & Frameworks: Python, FastAPI, React, Typescript; Cloud Platform: Google Cloud Platform (GCP); Databases: PostgreSQL, DuckDB; Infrastructure: Cloud Run; Challenges We Tackle: Transforming noisy datasets into high-quality data products; Running expensive analytics computations efficiently; Managing the complexity of a growing number of data sources, machine learning models, and large data operations; Join Us: If you're passionate about solving complex data challenges and excited by the opportunity to work with us on cutting-edge technologies, we'd love to hear from you.

Requirements

  • Located within US timezones
  • Committed to creating great products and experiences for our users