Join Us Wednesday, October 22

Reddit filed a lawsuit against Perplexity, along with several other data mining companies, accusing them of stealing the social media platform’s valuable data.

Reddit’s lawsuit, filed on Wednesday in Manhattan federal court, said Perplexity and the three other firms it sued — Oxylabs UAB, AWM Proxy, and SerpApi — illegally circumvented Reddit’s digital guardrails by scraping its content through Google’s search engine results.

“These Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead,” Reddit’s lawsuit alleges.

Reddit said it sent a cease-and-desist letter to Perplexity in May 2024 demanding it stop scraping Reddit data unless it made a deal with the social media company, as Google and OpenAI had done.

Perplexity said it “was not using Reddit content to train any AI models and that it would respect Reddit’s robots.txt,” according to the lawsuit.

But Perplexity’s citations to Reddit increased “forty-fold after Reddit told it to stop,” the lawsuit added.

“Rather than respect Reddit and its users’ rights, what Perplexity has done in response is simply come up with increasingly devious schemes to circumvent Reddit’s security systems and policies,” the lawsuit says.

According to the lawsuit, Perplexity appears to have used at least one of the data scrapers to ingest the platform’s data into its AI models.

“In other words, Perplexity’s business model is effectively to take Reddit’s content from Google search results, feed them into a third party’s LLM, and call it a new product,” the lawsuit says. “While that business model has somehow translated into a $20 billion valuation, it has not resulted in a willingness to pay for what others (including Google) have.”

Perplexity spokesperson Jesse Dwyer said the company “will always fight vigorously for users’ rights to freely and fairly access public knowledge.”

“Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest,” Dwyer said.

A Reddit spokesperson confirmed to Business Insider that the company has spent tens of millions of dollars on anti-scraping systems, which the lawsuit says these companies circumvented.

The lawsuit said Reddit caught Perplexity bypassing its guardrails by setting up a test post that acted as a digital “marked bill.”

The test post could only be viewed by Google’s search engine, the lawsuit said, so Perplexity and other AI companies should not have been able to use it for their models.

The contents of the post soon appeared in Perplexity, indicating that it or another data scraper it worked with had taken the content without permission.

“Within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post,” Reddit’s lawsuit says.

Reddit’s lawsuit quotes a social media post from Cloudflare’s CEO comparing Perplexity to “North Korean hackers” for appearing to try to hide its web-crawling activity.

“Some supposedly ‘reputable’ AI companies act more like North Korean hackers,” Matthew Prince wrote on X in August. “Time to name, shame, and hard block them.”

Representatives for SerpApi and Oxylabs did not immediately respond to a request for comment by Business Insider. AWMProxy, identified in the lawsuit as a former Russian botnet, could not immediately be reached for comment.

In a statement to Business Insider, Reddit’s chief legal officer Ben Lee said Oxylabs UAB, AWM Proxy, and SerpApi were “textbook examples” of illegal scrapers.

“Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material,” he said. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Reddit launched in 2005 as an online discussion forum, but is now trying to add value through a new strategy: search traffic. The decision has put Reddit in competition with companies like Perplexity.

“Reddit is one of the few platforms positioned to become a true search destination. We offer something special: a breadth of conversations and knowledge you can’t find anywhere else,” the company said in its Q2 report in July. “Every week, hundreds of millions of people come to Reddit looking for advice, and we’re turning more of that intent into active users of Reddit’s native search.”

Online search traffic has become a profitable industry led by companies like Google, which announced an expanded partnership with Reddit in March 2024 to train its AI models on the platform’s content. On its end, Reddit gained access to Google’s Vertex AI, allowing the platform to add enhanced search and other features. One month later, Reddit went public with a $6.4 billion valuation.



Read the full article here

Share.
Leave A Reply