You have /5 articles left.
Sign up for a free account or log in.
Overearth/iStock/Getty Images Plus
On Feb. 15, Google DeepMind employee Susan Zhang shared on X a sponsored LinkedIn message she received stating that the University of Michigan is licensing academic speech data and student papers for training and tuning large language models (LLMs). As Zhang’s post spread across social media, outrage over the monetization of student data quickly grew, prompting Michigan to issue an official statement.
According to the university, the post had been sent out by “a new third party vendor that has since been asked to halt their work.” Furthermore, the university argued that rather than “student data” being offered for sale, the data set consisted of anonymized student papers and recordings, voluntarily contributed about two decades or more prior with signed consent for improving “writing and articulation in education.” While the release of this statement helped to calm the backlash, this case offers a crucial window into how the ethics of student data use are tied up with commercial interests in this latest period of AI fever. We shouldn’t be too quick to forget it.
Conversations about artificial intelligence in higher education have been all too consumed by concerns about academic integrity, on the one hand, and how to use education as a vehicle for keeping pace with AI innovation on the other. Instead, this moment can be leveraged to center concerns about the corporate takeover of higher education.
While AI is being framed as a contemporary scientific breakthrough, AI research goes back at least 70 years. However, increasing excitement about the commercial potentials of machine learning have led tech companies to rebrand AI as “a multitool of efficiency and precision, suitable for nearly any purpose across countless domains.” As Meredith Whittaker points out, LLMs are one of the most data- and computing-intensive techniques in AI. Precisely because LLMs and machine learning require vast computational infrastructure, corporate resources and practices are foundational to this type of AI development.
Despite significant issues of bias, unethical data-sourcing practices and environmental harms, LLMs and other corporate-backed AI tools are becoming default infrastructure for teaching and learning at the same time that data taken from students and faculty is being used for AI development. OpenAI’s chat bot, ChatGPT, powered by an LLM, is increasingly being integrated into higher ed classrooms despite documented forms of neocolonial labor exploitation and its tendency to reproduce hegemonic worldviews (among a host of other ethical issues).
A range of private vendors are promising to automate exam proctoring, writing support, academic advising and the identification of “at-risk” students, the curation of online learning content, teaching assistant tasks, and grading. Companies are selling emotion-detection technology to measure facial movements to purportedly assess student attentiveness. Furthermore, this year, Arizona State University partnered with OpenAI to create AI tutors for students in one of its largest courses, a first-year composition class. In this short piece, I’d like to focus on three key issues related to AI as a vehicle for the consolidation of corporate power in higher education: transparency, privacy and exploitation.
Transparency
One major challenge concerning the development and use of AI in higher education is a lack of transparency. Even in the University of Michigan’s official statement, the name of the third-party vendor (Catalyst Research Alliance) was not included. It’s also unclear whether the students who consented to the Michigan studies agreed to or even imagined their data being packaged and sold decades later for LLM research and development.
Earlier this year, two major academic publishers, Wiley and Taylor & Francis, announced partnerships with major tech companies, including Microsoft, to provide academic content for training AI tools, including for automating various aspects of the research process. These agreements do not require author permission for scholarship to be used for training purposes, and many are skeptical of assurances regarding attribution and author compensation. Academic labor is being used to generate AI-related revenues for publishing companies that, as we’ve already seen, may not even disclose which tech companies they’re partnering with, nor publicize the deals on their websites. Cases like these have prompted the Authors Guild to recommend a clause in publishing distribution agreements that prohibits AI training use without the author’s “express permission.”
Increased transparency will potentially help students and faculty push back against the use of their labor for AI development that they find extractive or unethical. However, transparency does not necessarily guarantee accountability or democratic control over one’s data. While transparency is important, it is certainly not enough.
Privacy
Many people might also assume that the Family Educational Rights and Privacy Act protects student information from corporate misuse or exploitation, including for training AI. However, FERPA not only fails to address student privacy concerns related to AI, but in fact enables public-private data sharing. Universities have broad latitude in determining whether to share student data with private vendors. Additionally, whatever degree of transparency privacy policies may offer, students are rarely empowered to have control over, or change, the terms of these policies.
Educational institutions are permitted to share student data without consent with a “school official,” a term that after a 2008 change to the FERPA regulations was defined to include contractors, consultants, volunteers and others “to whom an educational agency or institution has outsourced institutional services or functions it would otherwise use employees to perform.” While these parties must have a “legitimate educational interest” in the education records, universities have discretion in defining what counts as a “legitimate educational interest,” and so this flexibility could permit institutions to potentially sell student information for funding purposes. Under conditions of austerity, where public funding for education is increasingly curtailed and restricted, student data is especially vulnerable to a wide range of uses with little oversight or accountability.
More broadly, conversations about the ethics of information technology in the U.S. have generally been framed in terms of privacy at the expense of other issues relating to racial discrimination and economic exploitation. When ethical issues are framed only in terms of privacy, the questions typically revolve around ensuring that data is collected anonymously and stored securely, and that students can readily opt out. However, we can also ask, should a given tool be deployed at all?
Exploitation
The practice of sharing student data with little accountability or oversight not only raises privacy issues, but also permits student data to be exploited for the purposes of creating and improving private firms’ products and services. In this sense, private firms are able to save money on what would otherwise require investment in market research and product development by virtue of being able to put to work the student data they collect. Student data typically becomes indefinite assets of universities and private firms once collected, especially once de-identified. There is also a sense of entitlement to student data, not only among university administrators and private technology firms, but in many cases, among university researchers who are contributing to the development of AI using data from students.
It is notable that the technology industry has played a significant role in shaping how universities conceive of and assess the risks of partnering with private vendors. For instance, the Higher Education Community Vendor Assessment Toolkit, originally developed in 2016, is used in more than 100 universities for measuring vendor risk by confirming the policies in place to protect sensitive institutional and personally identifiable information.
The Higher Education Information Security Council (HEISC) put forward this tool kit in collaboration with Internet2, a digital technology company with a target market that includes higher education, research institutes and government entities. HEISC is part of the nonprofit association Educause, whose stated mission is to encourage the diffusion and adoption of information technology for educational purposes. Educause’s roots as an organization can be traced back to the annual College and University Machine Records Conference of 1962, where 22 data processing directors from American universities and colleges organized, with sponsorship from IBM, to share information about how they were putting to use the IBM 1401, a computer for processing punch-card data. For IBM, sponsoring higher education meetings such as theirs was a means of cultivating new markets for its products while tying the IBM brand image to higher education. This practice continues today, with a range of big tech corporations sponsoring conferences and offering grant funds to higher education researchers and information technology professionals, including for AI development.
Conclusion
As I argue in Smart University: Student Surveillance in the Digital Age (Johns Hopkins Press), at a time when university administrators are suggesting replacing striking graduate students with generative AI tools, school districts are using ChatGPT to decide which titles should be removed from library shelves and university researchers are taking photos of students without their knowledge to train facial recognition software, it is crucial that we get to democratically deliberate about whether and how a range of digital tools are incorporated into the lives of those who live and work on college campuses. This includes the ethics of using data from students and faculty to improve the efficacy of AI in ways that drive power and profits to private companies at our expense.
In the meantime, students and faculty are using a host of strategies to fight back, including open letters, public records requests, critical education and refusals to work on research and development for harmful AI applications. More fundamentally, the struggle against corporate-backed AI is also a struggle against the privatization of universities more broadly, which has dramatically limited the power of self-governance in higher ed. Let’s not miss an opportunity to turn this latest wave of AI hype and hysteria, which will surely dissipate, into an occasion to shore up power over our learning and working conditions. This includes demanding more control over our labor, so often the source of the very “intelligence” that AI is used to extract.