Anthropic Would Pay $1.5 Billion to Authors in Copyright Settlement

Anthropic agreed to pay over $1.5 billion to settle a class-action lawsuit brought by authors who allege the company illegally used their books to train its artificial intelligence models, according to a Friday court filing.

The proposed class-action settlement, if approved by a federal judge, would be “the largest publicly reported copyright recovery in history,” according to lawyers from Anthropic and the authors.

It also sets a new standard for AI companies compensating artists, authors, and other copyright holders for training large language models on their work, according to Justin Nelson, an attorney at Susman Godfrey representing the artists.

“This landmark settlement far surpasses any other known copyright recovery, ” Nelson told Business Insider. “It is the first of its kind in the AI era. It will provide meaningful compensation for each class work and sets a precedent requiring AI companies to pay copyright owners.”

Under the terms of the proposed settlement, authors would receive an estimated $3,000 for each pirated book used to train Anthropic’s large language models, which power its popular chatbot, Claude.

The money would go to authors whose books were included in Pirate Mirror Library and Library Genesis, two databases of pirated books Anthropic used for training.

Lawyers for Anthropic and the authors believe 500,000 books are in the database, but said Anthropic would shell out more cash if more books were found.

According to the settlement terms, Anthropic could still be held liable if it uses pirated books in the future. The AI company is also required to destroy its copies of the Pirate Mirror Library and Library Genesis datasets after it finishes identifying and paying authors.

The settlement agreement will only go into effect if it is approved by US District Judge William Alsup, who is overseeing the case. Alsup scheduled a hearing for Monday afternoon in his San Francisco courtroom to discuss whether the terms are fair.

The settlement does not change Alsup’s earlier ruling that Anthropic’s use of the books to train its models was “fair use” under copyright law.

“In June, the District Court issued a landmark ruling on AI development and copyright law, finding that Anthropic’s approach to training AI models constitutes fair use,” Aparna Sridhar, Anthropic’s deputy general counsel, said in a statement Friday. “Today’s settlement, if approved, will resolve the plaintiffs’ remaining legacy claims.”

Alsup’s summary judgment ruling found that Anthropic had used over 7 million pirated books to train its large language models, and physically cut up and scanned millions of others.

But according to Friday’s settlement terms, attorneys for the authors found duplicates in Anthropic’s training database and, after matching books with Copyright Office registrations, whittled the list of stolen books down to 465,000.

If it’s approved, the settlement would mark the end of one of the highest-profile and most sophisticated copyright lawsuits against artificial intelligence companies.

It would also mean the case will not make its way to the Supreme Court, which could have taken the opportunity to resolve lingering questions about copyright law related to model training and generative AI.

According to Friday’s proposed settlement terms, lawyers representing the author are asking Alsup to approve 25% of the over-$1.5 billion settlement fund to go to their legal fees.

Read the full article here