You have /5 articles left.
Sign up for a free account or log in.
Many times I have intoned that we live in an age of information. Information is the principal commodity of the global economy. It is the foundation of Meta/Facebook and Google success to be sure, a little less so but not insignificantly for Microsoft and Amazon. Apple is largely exempt from that list insofar as they create physical products and do not rely on information for profits. This fact may also explain why, if the performance of Siri is any indication, Apple is behind in its development of artificial intelligence(AI).
In a course on artificial intelligence, the significance of data should be obvious: it is one of its key building blocks. Advance algorithms allow for machine learning on neural networks supported by tremendous computing power. Data became the third piece of the puzzle. In the Crawford book, we learned something about the quality of data used in the development of A.I.—most notably, data that was derived from archives of the government. This data tended to include images and information about those people with the least power in society, for example criminals or those dependent on government assistance. Classifications, moreover, fell back on racist tropes. “A.I. systems are shown to produce discriminatory results,” Crawford writes on page 125, because those systems have been fed discriminatory data.
Crawford wrote her book before the launch of OpenAI and Chat GPT, before generative, or what I call consumer, A.I. That was before we learned that these more sophisticated and responsive systems have been trained on the internet itself. A whole host of questions have newly emerged that she does not address. Who gave OpenAI permission to scrape the internet? Is permission necessary if one has the technical means to do it? Is information free so long as it is freely available on the internet? Numerous artists instantly brought suits against OpenAI for copyright violations. More recently, The New York Times has brought a suit, which is sure to be the one for history books. But what lies between content in general and intellectual property? Who owns data? What legal rules protect information?
Information privacy fits into this liminal space. It is, however, in the U.S., a very limited set. Information privacy does not have constitutional protection as decided in Whelan v. Roe, 1977. It is not a right. Where information privacy exists, it does so as a matter of statutory law. Moreover, information privacy laws are sectoral, siloed. Politics, not principle, explains the providence of those data sets.
For example, HIPAA (the Health Information Portability Accountability Act) came out as a government regulation against insurance companies. Prior to this law, health care insurance companies obtained information about individuals and their prior conditions freely. Insurance companies made decisions based on that information that affected consumers adversely. FERPA (the Family Education Rights Privacy Act) emerged from the Church Committee investigations into law enforcement abuses in higher education during the civil rights and anti-Vietnam war student movements. The Financial Services Modernization Act arose in the wake of bank failures and many breaches that affected consumers. Infamously, the Video Recordings Rights Act grew out of the hearings for Supreme Court nominee Robert Bork. Opponents to his nomination revealed video circulation records that included pornography. A jittery Congress immediately passed a law protecting those records in a panic that their own video habits would be on display to the public.
Fair information practices are the standard approach to the management of information privacy. These practices consist of four main points: notice the entity provides to the consumer that it holds their information; relevancy of the use of that information to its business purposes; accountability, such that mistakes can be corrected; and security provisions that protect the information. These practices do not automatically have the force of law. Rather, those practices require an accompanying law to be effective.
One example is the FACTA, the Fair and Accurate Credit Transaction Act of 2003. This law allows consumers to correct their credit reports in cases of mistakes or incorrect identity that would have a material impact on a person, for example, attempting to obtain a loan for a car or a mortgage for a house. Prior to that law, people might complain about being associated with the deadbeat John Smith when they were the upstanding and flush John Smith, but they had no legal means of correcting their credit report. Security practices are an obvious and necessary part of the privacy of information, but laughable in the face of umpteen data breaches. With no right of action available to individuals affected by these breaches, the harm often realized in the form of identity theft, consumers simply pay the price of inadequate security controls on their personally identifiable information. (N.B. data breach notification laws are state, not federal, and very varied.)
What do these lessons in information privacy tell us about A.I.? For one thing, Information on the internet is pretty much for the taking. You might then ask, what about the information I provide Google when I do a search? “Digital exhaust” is what one famous scholar, Shoshana Zuboff, calls it. It is what you gave away when you agreed to Google’s Terms of Service. Never mind that this “exhaust” is the foundation of Google’s extraordinary monetary success.
It goes further. In the aughts, as the director of IT Policy at Cornell, I sat in meetings with Google representatives who asked to scan the university’s library holdings. Never mind the original accumulated expense Cornell bore of eight million volumes. To engage in the transformational effort to “organize the world’s information and make it universally accessible and useful” proved compelling enough that the university agreed to it (along with “free” mail services, but that is another story told in this document). Never mind that Google monetized that repository. And now OpenAI is asking us to never mind that they scraped it.
Briefly, about intellectual property, I have a prognostication. The newest fair-use factor, transformation, will successfully defend OpenAI against infringement claims for having scraped the internet. Scraping, after all, is not new; Google does it to do search. If media companies want a piece of that action, they can and have negotiated with Google for some kickback (so to speak). I suspect that is what The New York Times claim is all about. On that point, I suspect that a negotiated settlement will occur long before a jury is empaneled.
Output is another matter. On those claims, OpenAI faces a much different kind of challenge. Content owners have been resoundingly successful in copyright claims against companies and individuals who distribute their content on the internet. To the degree that ChatGPT’s output can be found to be “substantially similar” to copyrighted materials, OpenAI as the distributor is likely to be liable for infringement.
We will have to wait and see the outcome of that contest. And in the meantime, we can debate at what cost innovation, and at the expense of whom?