AI and Intellectual Property Theft
In case you’ve missed the headlines, it turns out that Meta’s generative AI program was trained on Books3, a vast dataset of books, many of them pirated. It seems probable that other generative AI chatbots, such as OpenAI’s ChatGPT, were as well. (NOTE: Link goes to an article in The Atlantic and may not be visible if you have used up your one free article for the month.)
The authors whose books were used range from fairly obscure to very well known indeed, and includes writers like Margarat Atwood, Agatha Christie, Alice Hoffman, Stephen King, Lisa Kleypas, Nora Roberts, Brandon Sanderson, and J.R.R. Tolkien. Some are dead, but many more are currently writing. The works in the dataset include both traditionally published and self-published works.
A searchable database of affected authors includes at least 183,000 books, so if you read fiction at all, it’s almost guaranteed that one or more of your favorite authors’ works were used without their permission. I looked up more than 25 of my favorite authors (in addition to the authors listed above, some of whom I have not read.) Only two did not show up in the dataset, and one of those is both self-published and relatively obscure. And we’re not talking paltry numbers of books; for most of the bestselling and midlist authors I checked, the dataset contains practically their entire oeuvre.
This is one reason why Hollywood writers are so worried about AI, both in terms of their work being used without their permission to train AIs, and the possibility of eventually being replaced by those same AIs. I’m very glad the Hollywood writers got some protections in their recent agreement with the studios. But much more needs to be done to protect all creatives from having their work stolen or coerced from them. Ultimately, the actions of the tech developers needs to be treated as what they are: widespread and systematic copyright violation. In other words, theft.
There are now several class-action lawsuits gearing up, including one brought by the Authors Guild and several prominent authors. Other organizations may follow suit. (I am surprised that none of the major publishers has yet done so.) I hope these lawsuits can successfully defend authors’ copyright and show tech corporations that intellectual property is not a free resource, there for the pillaging. But what we really need is legislation to protect all intellectual property from this sort of predation. Writers aren’t the sole victims of the tech developers’ actions; artists, musicians, and even voice actors have had also their work used without permission.
By the way, if you blog, or post your art on anywhere on the web? It’s possible that your work has been slurped up by AI programs, too—or will be in the future. Even if the writers’ lawsuits are successful, it’s unlikely you can stop your work from being used to train AI systems. For example, if you post your reviews on Goodreads, Amazon, Facebook, or Instagram, you have already given Amazon (which owns Goodreads) and/or Meta (which owns Facebook and Instagram) permission to use your reviews as they see fit.
Finally, I don’t want to paint generative AI as the devil incarnate. Generative AI has the potential to be a useful tool for writers and artists, as well as for the general public…if we get it right. But we must make sure that human creatives of all types are protected from being harmed (further) by the development, training, and deployment of generative AI, whether through the theft of their work or by their replacement altogether.
Some related links:
- What I Found in a Database Meta Uses to Train Generative AI (The Atlantic; paywall)
- These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech (The Atlantic; paywall)
- Searchable dataset of authors (link provided by the Authors Guild)
- The Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 Other Authors File Class-Action Suit Against OpenAI (Authors Guild)
- You Just Found Out Your Book Was Used to Train AI. Now What? (Authors Guild)
- Practical Tips for Authors to Protect Their Works from AI Use (Authors Guild) (includes tips for blocking OpenAI’s crawler from your website)
- Zadie Smith, Stephen King and Rachel Cusk’s pirated works used to train AI (The Guardian)
- ‘Biggest act of copyright theft in history’: thousands of Australian books allegedly used to train AI model (The Guardian)
- Authors Are Furious After Finding Their Works on Huge List of Books Used To Train AI (The Mary Sue)
- Authors Band Together To Address AI Concerns in Publishing Contracts (The Mary Sue; posted before the news about the Books3 dataset came out.)
Nicole @ BookWyrm Knits
This is such a mess. I love what generative AI is capable of. And I hate the way people are trying to get around the legal aspect of ownership to train those AI. There’s no easy answer, but I hate that humanity’s current attempt seems to be try to steal and get away with it.
Nicole @ BookWyrm Knits recently posted…Mythothon Round 8 ~ Wrap-Up (almost)
Lark_Bookwyrm
Yes, to all of this. Though I am concerned that generative AI could take my job someday. Hopefully not before I’m ready to retire, though.