Current location:

Massive Books3 collection for training AI was taken down over copyright issues

2025-04-27 05:38:59

AI as we know it basically exists to eat up the internet and spit it back out at you. The problem with that is huge parts of the internet are protected by copyright law.

That's one of the major takeaways from the gigantic Books3 database getting taken down following a DMCA request by the Danish anti-piracy group Rights Alliance, as originally reported by TorrentFreak. Books3 contained a little more than 196,000 books in plain-text format for AI models to chew on for training purposes, but aside from a few alternate links floating around the internet, it's no longer publicly accessible. The old link to it goes to a 404 page.

SEE ALSO: Google is looking into doling out AI-generated life advice

Books3 existed as part of a larger collection of AI training content called The Pile, organized by the research group EleutherAI. As noted by a Gizmodo report on the subject, Meta has referenced using The Pile for training its in-house AI model before. It wouldn't be the first big tech AI model to potentially be trained on illegally disseminated material, as a class-action lawsuit filed in July accused Google of doing the same thing.

Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. Thanks for signing up!

This stuff gets tricky fast in the legal sense, but also in the ethical sense. For instance, a person who might be in favor of piracy in general for historical archival purposes could also vehemently oppose AI models being trained on copyrighted material (I feel like I know several people who think this way). It's also easy to understand why authors would oppose their work being used this way, as the makers of these AI models could theoretically profit off of other people's work in the future.


Related Stories
  • Google's Bard AI chatbot is vulnerable to use by hackers. So is ChatGPT.
  • Google is looking into doling out AI-generated life advice
  • Google Chrome will use generative AI to summarize articles
  • School uses ChatGPT to determine which books are banned
  • OpenAI expands ChatGPT 'custom instructions' to free users

The only thing that's certain is that these battles are only going to get messier from here.

| FLINK |
| LINK |