The Pedagogical Dangers of AI Detectors for the Teaching of Writing

Jacob Hubbard | San Diego State University

Imagine you are a college student in 2023 writing an essay for a class. You’ve spent all night drafting, revising, polishing, and putting in your final sources for the project before submitting it for a grade. You feel proud of your work, but then get your paper back. Your professor gave it a zero!

The reason: They claimed your hard work was AI generated!

That sounds preposterous, but that is exactly what happened recently at Texas A&M University at Commerce in May 2023. Professor Jared Mumm sent his animal science class an angry email about how students were in danger of failing the class because they used ChatGPT to cheat. “I will not grade this ChatGPT S***,” Mumm wrote on one of the student’s assignments. Many students panically rushed to prove their innocence so they could graduate. By the time it was over, several students were exonerated, others were given a make-up assignment, and only one student confessed to using ChatGPT. This, however, raises some important questions: Did Mumm’s method of cheat-detection cause more harm than good? If so, what does that say about this type of teaching practice as it relates to addressing academic dishonesty?

Much has been discussed about the rise of ChatGPT and other AI generated writing software. This technology is still relatively new, with many questions about its place in the writing classroom. Many instructors have opted to use AI detection software to deter academic dishonesty. As a Writing Instructor at San Diego State University for over six years, I have gained a general idea of what student writing looks like. I assign students multiple drafts of their papers to demonstrate their writing process, gaining an understanding of their abilities. This helps me identify patterns in their writing, such as inconsistencies or discrepancies, or to determine if something is “off” within their writing.

Contrary to my own methods, Mumm used ChatGPT exclusively to detect plagiarism, despite ChatGPT not being designed to detect AI generated content. Consequently, students were unfairly accused of academic dishonesty. This alarming trend inspired me to look into how professors have used AI detection software. Because these programs are unreliable as they neither accurately nor fairly detect plagiarism, I argue there is a real danger in using these AI detection software as an effort to mitigate this issue.

When I learned about Mumm and his students, I experimented with some of these AI detection programs to determine their effectiveness. As of this writing, AI Detection programs such as GPTzero, ZeroGPT, Originality.ai, Sapling AI Detector, and many others are available. The majority of these programs are not directly associated with OpenAI (the creator of ChatGPT) but affiliated with companies hoping to provide publically available AI detection software. For this experiment, I used ZeroGPT, as it was the most accessible without a paywall, and claimed to have developed an algorithm “with an accurate rate of text detection up to 98%.”
To conduct my experiment with ZeroGPT, I used samples of a book I am currently writing. I plugged in passages of a recent chapter I wrote and got varying results. According to ZeroGPT, it flagged one passage as “likely human, may include parts generated by AI/GPT,” even though I wrote the passages myself with no help at all from ChatGPT (I have changelogs in my google docs to prove this to be the case). To further test ZeroGPT, I had it analyze more passages of early drafts of two chapters I wrote in August and September 2022. Again, ZeroGTP flagged parts of my writing as including AI generated content, with scores that ranged from 15% to 53% “likely generated by AI/GPT.” This is despite the fact ChatGPT did not launch to the public until November 2022.
This raised alarm bells for me, so I did an experiment: I had ChatGPT write me a fake 1000-word essay about whales and fed it into ZeroGPT. Here are ZeroGPT’s conclusions:

Fake Essay Written about Whales by ChatGPT results

This image shows the results of the Fake Essay about Whales from ChatGPT being put through the AI Detection Program ZeroGPT. It shows highlighted yellow text and the results of 38.5% AI GPT, with the text "Your Text is Likely Human Written, may include parts generated by AI/GPT"

Despite ChatGPT being fed a prompt to write this essay, ZeroGPT said it was “likely” written by a human. I found this strange, so to further test ZeroGPT, I had it analyze a document from a time period where it would be impossible for content to be generated by artificial intelligence. For this, I fed ZeroGPT the Declaration of Independence. Here are the results:

The Declaration of Independence results

As you can see in the figure above, the results generated were not dissimilar to the fabricated whale essay, claiming it “may include parts generated by AI/GPT.” I then decided to test a document from the same time period, picking the “United States Constitution” to see what would happen. Here are the results:

The United States Constitution results

As you can see in the results above, ZeroGPT claimed the United States Constitution was AI generated, giving it a 92.15% rating! I guess the Founding Fathers were using advanced technology at the time! On a more serious note, it is true these are small sample sizes, but these small sample sizes make this level of “detection” questionable and troubling as a valid classroom tool. In another article from the Washington Post, high school senior Lucy Goetz had her work erroneously flagged by Turnitin’s AI detector, despite no evidence of Goetz using ChatGPT to write her essay. According to the article, “Detectors are being introduced before they’ve been widely vetted, yet AI tech is moving so fast, any tool is likely already out of date.” In Reddit threads online (Example 1 & Example 2), students have raised serious concerns about the accuracy of these programs and how it affected their experience in school.

One could argue AI Detection programs are not designed to be comprehensive, and a program reporting low confidence in being written by human hands should not be a critique of the model. While this is technically true, the samples discussed in this blog post should give us a taste of a larger problem with AI Detection. When we look at Mumm’s use of ChatGPT, his method caused more harm than good because it relied on a machine not designed to detect AI content to catch cheating. Similarly, many AI detection programs like ZeroGPT andGPTZero result in too many false positives/false negatives to be reliable methods of detecting AI generated content. This raises the following questions about the pedagogical dangers of using these AI detectors to evaluate student writing:

How confident should we be in these programs to detect AI generated content?
The general reliability of these programs are a serious mixed bag. Considering the samples discussed previously, it should make us question how confident we should be in these AI detection programs detecting AI writing, and which best practices are most appropriate for determining a student’s work is original or AI-generated. We should instead make more use of more comprehensive strategies in different types of writing classrooms that focus on working with students in different ways (e.g. submitting multiple drafts, demonstrations of the writing process, etc.) to mitigate AI-generated issues.
How do the use of these AI Detection programs affect the way we evaluate student work? In addition to determining what metric we use to see if students are submitting their own work, it should also make us reflect on how these programs affect our evaluation of student work. For instance, what bias(es) do we inadvertently introduce in our evaluation when we use these programs to determine what is AI generated writing and what is student writing? For students who have limited access to technology, they may not necessarily make use of ChatGPT but sometimes have their work flagged for AI generated content anyway. This especially true for Non-Native speakers who often have their work be classified at AI generated at a higher rate than Native speakers.
What Student/Teacher dynamics do the use of these AI Detection programs create?
The presence of AI Detection programs should make us question the changing dynamic of the Student/Teacher relationship. Assuming these programs are prone to false positive/false negative detection of AI/human writing, we should investigate how this affects the way students and teachers communicate with one another. For example, if a student believes their work will be unfairly scrutinized, their motivation and trust in the writing process can be harmed. The questionable reliability and accuracy of these programs (as of 2023) should force us to evaluate what relationships we cultivate and how we evaluate student writing through these programs, especially if and when they get it wrong—and when we get it wrong too.

These questions should make us wary of the potential pedagogical dangers of AI detection programs for teaching writing. As this technology evolves, it will become increasingly difficult to predict how AI detection programs like ZeroGPT or GPTZero will keep up in this ongoing arms race with their AI analogues such as ChatGPT. We can discuss how to incorporate effective AI writing tools, but the technology is not yet advanced enough to produce or identify AI text indistinguishable from a human.

Going back to Professor Mumm and his students, this news story demonstrates the pedagogical danger of what can happen if we rely on programs to detect academic dishonesty. Unlike Mumm, I do not put my students’ work into an AI detection program when I suspect plagiarism. Instead, I will have students write multiple drafts, look for patterns in their writing, and work with the student directly if I suspect plagiarism. The rise in AI generated content makes it tempting to use these AI detection programs as part of our pedagogical toolset. However, professors still need to take the time to understand their students and their writing styles, and I fear relying on AI detection programs to detect plagiarism can actively make that task much more difficult than it needs to be. Why operate under the assumption students are going to cheat? Why assume a student is going to use ChatGPT to write their essay and not put in the work? Wouldn’t it make more sense to work directly with students to mitigate issues of plagiarism, rather than relying on a machine to do it for us? On paper, AI detection programs seem like they would be a useful tool for teachers as they can help take out some of the guesswork when we suspect the possibility of cheating and plagiarism. However, when we examine general unreliability of these programs to accurately and fairly detect AI generated content, it casts doubt as to their usefulness for teaching in the first place.

Works Cited

Fowler, Geoffrey A. “ChatGPT Cheating Detection.” The Washington Post, 1 Apr. 2023,
www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/.

“GPT-4, ChatGPT & AI Detector by Zerogpt: Detect OpenAI Text.” ZeroGPT,
www.zerogpt.com/.

Liang, Weixin, et al. “GPT Detectors Are Biased Against Non-Native English Writers.” arXiv, 18 Apr. 2023, https://arxiv.org/pdf/2304.02819.pdf

U/infinitywee. “GPT Zero False Flagging Human Written Papers.” Reddit, 31 Jan. 2023, www.reddit.com/r/Professors/comments/10pya6w/gpt_zero_false_flagging_human_written_papers/.

U/Nervous_Detective150. “GPT Zero is Not Accurate at All.” Reddit, 17 Feb. 2023, www.reddit.com/r/ChatGPT/comments/1155shx/gpt_zero_is_not_accurate_at_all/.

Verma, Pranshu. “Texas Professor Threatened to Fail Class over ChatGPT Cheating.” The Washington Post, 18 May 2023, www.washingtonpost.com/technology/2023/05/18/texas-professor-threatened-fail-class-chatgpt-cheating/?fbclid=IwAR3ydVe4ziEBId0ZwPdhuwIcAl9iRtiq8_zIOin6Y7jJJUTmTpQNeHKXPuE.

3 thoughts on “The Pedagogical Dangers of AI Detectors for the Teaching of Writing ”

Anonymous 10 Jul 2023 at 4:05 pm

Alternate title “AI Detectors: Crushing Freire’s Dreams of AI-empowered Students, One Algorithm at a Time”

LikeLike

Reply ↓
Pingback: Call for submissions: Write Here, Write Now: Kairotic Pedagogy in the Age of AI | Composition Studies
Pingback: AI and Writing Centers – Writing Studies and Rhetoric Blog

Composition Studies

Independent academic journal since 1972

The Pedagogical Dangers of AI Detectors for the Teaching of Writing

3 thoughts on “The Pedagogical Dangers of AI Detectors for the Teaching of Writing ”

Leave a comment Cancel reply

Share this:

Related

3 thoughts on “The Pedagogical Dangers of AI Detectors for the Teaching of Writing ”

Leave a comment Cancel reply