Tech Stack
CloudGoogle Cloud PlatformPostgresPythonPyTorchReactTypeScript
About the role
- About Us: Sumble's current focus is on acquiring, cleaning, and joining company-related data that integrates seamlessly with customers' data, enhancing go-to-market operations. Our long-term vision is to become the primary destination for accessing high-quality external data.
- Our Team: We are a dedicated team of 9 engineers with experience at companies such as Google, Meta, Kaggle and Stack Overflow.
- What you'll do: Finetuning small language models; Improving the quality of existing data using scalable approaches. Examples include: making sure URLs are associated the right company, we have the correct HQ address, we have mapped parents-subsidiary using techniques like LLM validation, SERP, and triangulating across sources.; Adding new signals: this usually involves scrubbing, matching and normalizing new signals and matching to our existing ontology; Pushing solutions into production environments, which may involve touching data pipelines and/or backend systems.
- More about Sumble: Our Tech Stack: PyTorch, Huggingface, Gemma models, LORA, VLLM, Skypilot, Marimo; Languages & Frameworks: Python, FastAPI, React, Typescript; Cloud Platform: Google Cloud Platform (GCP); Databases: PostgreSQL, DuckDB; Infrastructure: Cloud Run; Challenges We Tackle: Transforming noisy datasets into high-quality data products; Running expensive analytics computations efficiently; Managing the complexity of a growing number of data sources, machine learning models, and large data operations; Join Us: If you're passionate about solving complex data challenges and excited by the opportunity to work with us on cutting-edge technologies, we'd love to hear from you.
Requirements
- Located within US timezones
- Committed to creating great products and experiences for our users