Join Us Friday, February 21
  • Pulse raised $3.9 million to enhance unstructured data preparation for machine learning models.
  • The startup addresses the demand for custom copilots and agents using internal enterprise data.
  • Former GitHub CEO Nat Friedman and Daniel Gross led the seed funding round for Pulse.

Pulse, a five-person startup specializing in unstructured data preparation for machine learning models, has raised $3.9 million in a funding round led by Nat Friedman and Daniel Gross.

Pulse sells businesses a toolkit designed to convert raw, unstructured data into formats ready for use by machine learning models. This addresses the growing demand for enterprises to build custom copilots, chatbots, and digital agents tailored to their internal data.

“Let’s say you’re a financial institution or a healthcare company. There is no room for an LLM to make something up or hallucinate a number or an error,” said Sid Manchkanti, cofounder and CEO of Pulse.

Before Pulse, Manchkanti was a software developer at Nvidia. He started the company with his childhood friend, Ritvik Pandey, who previously worked on Tesla’s supercomputer project for training machine learning models, called Dojo.

Other investors in the company’s seed round include Y Combinator, Sequoia Scout, Soma Capital, Liquid 2 Ventures, the venture capital firm founded by Joe Montana, and individuals from Nvidia, OpenAI, and fintech startup Ramp.

Training data is the raw material that enables large language models to learn the relationships between words and phrases and mimic human-like text. However, training these models isn’t just about feeding them massive amounts of information. It takes curating and preparing information in the right way. You don’t put diesel in a gas engine.

Structured data is organized and searchable data that fits neatly into rows and columns, like data in an Excel spreadsheet or customer records. Unstructured data looks more like the files you work with on a daily basis. Think pages-long customer contracts, employee handbooks, sales presentations, and product demo videos. According to the tech market intelligence firm IDC, 90% of the world’s data is unstructured.

The conversion of messy data into training data often involves human workers. They may read through documents and images, enter relevant information into formats such as spreadsheets or databases, and review and clean the data — correcting errors and labeling the data to provide context for machine learning applications.

To automate this process, Pulse’s solution uses computer vision techniques and fine-tuned extraction models to understand complex documents and accurately parse their data.

Manchkanti says Pulse’s technology not only streamlines the process — making it faster and more efficient for businesses to leverage their unstructured data in machine learning models — but also improves accuracy. He estimates that teams lose 20% to 30% of their data with existing solutions due to poor extraction.

Pulse’s round builds on a swell of money into startups offering tools to eliminate this unstructured data bottleneck. Unstructured has raised $65 million in funding to date and counts over a thousand paying customers. Instabase recently secured $100 million in funding to expand its toolkit for extracting and processing unstructured data.

Manchkanti said the new money put into Pulse would allow the company to hire engineers and add data extraction for other formats, namely audio and video.



Read the full article here

Share.
Leave A Reply