OpenAI and Microsoft accused of stealing data to train ChatGPT in new class-action suit

The lawsuit alleges that OpenAI’s profits came as a result of using illegally scraped data to train its models.

OpenAI and Microsoft have been named as the defendants in yet another class-action lawsuit over their alleged use of web scraping techniques to obtain supposedly private data for the use of training ChatGPT and other associated artificial intelligence (AI) models. 

The most recent class-action suit was filed on Sept. 5 in San Francisco by a law firm representing a pair of unnamed engineers.

According to a filing registered with the United States District Court for the Northern District of California:

“This class action lawsuit arises from Defendants’ unlawful and harmful conduct in developing, marketing, and operating their AI products, including ChatGPT-3.5, ChatGPT-4.0, Dall-E, and Vall-E (the ‘Products’), which use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge.”

The lawsuit goes on to complain that OpenAI “doubled down on a strategy to secretly harvest massive amounts of personal data from the internet” after restructuring in 2019.

“Without this unprecedented theft of private and copyrighted information belonging to real people,” write the plaintiffs, “the products,” referring to ChatGPT, DALL-E and OpenAI’s other models, “would not be the multi-billion-dollar business they are today.”

According to the filing, the plaintiffs are asking the courts to award damages to the plaintiffs and any members of the proposed classes — which could conceivably include anyone whose information was allegedly scraped.

The suit also asks the courts to order the defendants to conduct “nonrestituionary disgorgement” of profits made as a result of the alleged illegal scraping of data.

Scraping is the practice of using an automated bot, often called a “crawler,” to collect data from the internet. This most recent suit alleges that OpenAI and Microsoft knowingly engaged in “illegal” scraping activity.

A previous class-action lawsuit making nearly identical claims against OpenAI and Microsoft was filed in the same court district on June 28. It’s unclear at this time if the court or defendants in the separate cases would consider combining the suits.

Related: US Copyright Office issues notice of inquiry on artificial intelligence

This isn’t the first time Microsoft’s been involved in a lawsuit over alleged scraping. The Redmond, Washington company issued a cease-and-desist order on behalf of its LinkedIn brand to data analytics company HiQ in 2019 over its admitted data scraping practices.

In that case, Microsoft and LinkedIn alleged that HiQ had violated the terms of service agreement required to log in to the LinkedIn website and thus have access to user data. Initially, the circuit court ruled in favor of HiQ, but upon Microsoft’s appeals, the Supreme Court vacated the judgment.

The case was then kicked back down to the circuit court, where Microsoft found itself on the winning side of the case. HiQ agreed to a settlement with Microsoft for an undisclosed amount and was ordered to cease its scraping activities.

Microsoft and OpenAI did not immediately respond to requests for comment.

Leave a Reply

Your email address will not be published. Required fields are marked *