You have /5 articles left.
Sign up for a free account or log in.

An orange robot hand holds a pencil over a standardized test. The test has bubbles filled in spelling out "A" and "I"

A Florida State University professor has found a way to tell if students used generative AI on multiple-choice exams.

Photo illustration by Justin Morrison/Inside Higher Ed | George Doyle, joebelanger and PhonlamaiPhoto/iStock/Getty Images
 

A Florida State University professor has found a way to detect whether generative artificial intelligence was used to cheat on multiple-choice exams, opening up a new avenue for faculty who have long been worried about the ramifications of the technology.

When generative AI first sprang into the public consciousness in November 2022, following the debut of OpenAI’s ChatGPT, academics immediately expressed concerns over the potential for students using the technology to produce term papers or conjure up admissions essays. But the potential for using generative AI to cheat on multiple-choice tests has largely been overlooked.

Kenneth Hanson got interested after he published research on the outcomes of in-person versus online exams. After a peer reviewer asked Hanson how ChatGPT might change those outcomes, Hanson joined with Ben Sorenson, a machine-learning engineer at FSU, to collect data in fall 2022. They published their results this summer.

“Most cheating is a by-product of a barrier to access, and the student feels helpless,” Hanson said. ChatGPT made answering multiple-choice tests “a faster process.” But that doesn’t mean it came up with the right answers.

After collecting student responses from five semesters’ worth of exams—totaling nearly 1,000 questions in all—Hanson and a team of researchers put the same questions into ChatGPT 3.5 to see how the answers compared. The researchers found patterns specific to ChatGPT, which answered nearly every “difficult” test question correctly and nearly every “easy” test question incorrectly. (Their method had a nearly 100 percent accuracy rate with virtually zero margin of error.)

“ChatGPT is not a right-answer generator; it’s an answer generator,” Hanson said. “The way students think of problems is not how ChatGPT does.”

AI also struggles to create multiple-choice practice tests. In a study published this past December by the National Library of Medicine, researchers used ChatGPT to create 60 multiple-choice exams, but only roughly one-third—or 19 of 60 questions—had correct multiple-choice questions and answers. The majority had incorrect answers and little to no explanation as to why it believed its choice was the correct answer.

If a student wanted to use ChatGPT to cheat on a multiple-choice exam, she would have to use her phone to type the questions—and the possible answers—directly into ChatGPT. If no proctoring software is used for the exam, the student then could copy and paste the question directly into her browser.

Victor Lee, faculty lead of AI and education for the Stanford University Accelerator for Learning, believes that may be one step too many for students who want a simple solution when searching for answers.

“This does not occur, to me, to be a red-hot, urgent concern for professors,” said Lee, who also serves as an associate professor of education at Stanford. “People want to … put the least amount of steps into anything, when it comes down to it, and with multiple-choice tests, it’s ‘Well, one of these four answers is the right answer.’”

And despite the study’s low margin of error, Hanson does not think that sussing out ChatGPT use in multiple-choice exams is a feasible—or even wise—tactic for the average professor to deploy, noting that the answers have to be run through his program six times over.

“Is it worth the effort to do something like this? Probably not, on an individual basis,” he said, pointing toward research that suggests students aren’t necessarily cheating more with ChatGPT. “There’s a certain percentage that cheats, whether it’s online or in person. Some are going to cheat, and that’s the way it is. it’s probably a small fraction of students doing it, so it’s [looking at] how much effort do you want to put into catching a few people.”

Hanson said his method of running multiple-choice exams through his ChatGPT-finding model could be used at a larger scale, namely by proctoring companies like Data Recognition Corporation and ACT. “If anyone’s going to implement it, they’re the most likely to do it where they want to see on a global level how prevalent it might be,” Hanson said, adding it would be “relatively easy” for groups with mass amounts of data.

ACT said in a statement to Inside Higher Ed it is not adapting any type of generative AI detection, but it is “continuously evaluating, adapting, and improving our security methods so that all students have a fair and valid test experience.”

Turnitin, one of the largest players in the AI-detection space, does not currently have any product to track multiple-choice cheating, although the company told Inside Higher Ed it has software that provides “reliable digital exam experiences.”

Hansen said his next slate of research will focus on what questions ChatGPT gets wrong when students get them right, which could be more useful for faculty in the future when creating tests.

But for now, concerns over AI cheating on essays remain top of mind for many. Lee said those worries have been “cooling a bit in temperature” as some universities enact more AI-focused policies that could address those concerns, while others are figuring out how to adjust their “educational experience” ranging from tests to written assignments to exist alongside the new technology.

“Those are the things to be ideally focused on, but I understand there’s a lot of inertia of ‘We’re used to having a term paper, essay for every student.’ Change is always going to require work, but I think this thought of ‘How do you stop this massive sea change?’ is not the right question to be asking.”

Next Story

Written By

More from Artificial Intelligence