S

Sumble

Sumble's applying language models (both small and large) to build high-quality datasets. Near-term our focus is company-related data that we integrate with customer's Salesforce data to make go-to-market operations more efficient. Long-term we want to become the first place you go to find high-quality data that lives outside your organization. Currently a team of 7 engineers, including Kaggle/Google/Primer/Stack Overflow/Meta alumni. Our tech stack includes Python, FastAPI, React/Typescript, GCP (postgres + alloydb + cloud run), regular expressions, and Pytorch/Gemma/Mistral. Challenges we face include: Transforming noisy datasets + noisy models into high quality data products, Making expensive analytics computations run efficiently, Managing the complexity of an increasing number of data sources, models, and operations on top of these.