19 April 2024 996 words, 4 min. read

Generative AI detectors: are they reliable? [Full test 2024]

By Pierre-Nicolas Schwab PhD in marketing, director of IntoTheMinds
In this article, I present the results of a test I conducted on 11 generative AI detectors. A clear winner emerges among the free tools. The results are mixed, if not downright bad, for half of the AI detectors tested.

Are generative AI detectors reliable? Since the invasion of the Internet by content produced by generative AIs (ChatGPT in particular), detecting this type of content has become a priority. In its latest search engine update, Google announced that it would penalize low-quality content. So, I tested 11 free generative AI detection tools to find the most reliable. As you’ll see in this article, the results are far from homogeneous and often very disappointing.

Contact IntoTheMinds, Marketing Research Agency

Content written by generative AIs has become the bane of the Internet, prompting some to claim their content as 100% human. Google has found itself trapped in its own game. It called for “fresh” content, which is exactly what happened when ChatGPT was made available to the general public. Some people jumped at the chance to produce content that was original in name only. As I showed in another study, the similarity rate of texts produced by ChatGPT is remarkably high. You’ve probably already had your suspicions about the origin of a text, a post on social networks, or a comment when you read it. When a generative AI writes it, it shows.

Faced with the scourge of content generated by generative AIs, tools have sprung up to detect them. I have selected 11 of them and tested them.

Methodology

To test the ability of the various tools to recognize texts written by a generative AI, I prepared a corpus made up of:

  • 3 texts written entirely by Chat GPT4.0 in English
  • 3 French translations of texts written by ChatGPT
  • 3 texts from my blog written entirely by me in French
  • 3 English translations of texts written by me

In the end, I had 12 texts distributed as follows:

French English
written by a generative AI 3 3
written by a human 3 3

I then ran each text through the tools listed at the end of this article.

I only used the free versions of the various tools. In the case of Scribbr and Copyleaks, the French texts could not be analyzed.

The results are summarized in the following table. Texts 1 to 6 were generated by Chat GPT 4.0. Texts 7 to 12 were written by me in French and then translated into English. Behind the text number, you’ll find the language of the text in brackets.

The results in the tables below correspond to the percentage of text detected as having been written by a generative AI.

Detection of texts written by a generative AI

The table below shows the results of the various tools used to detect texts written entirely by ChatGPT. The percentage indicated corresponds to the proportion of text that the tool attributes to a generative AI. In the case of Neuralwriter, the percentage corresponds to the tool’s confidence in attributing content to a generative AI.

For Copyleaks and Scribbr, the French language cannot be analyzed in the free version.

1 (FR) 2 (EN) 3 (FR) 4 (EN) 5 (FR) 6 (EN)
Quillbot 95% 95% 86% 100% 100% 100%
Copyleaks n/a 100% n/a 100% n/a 100%
Smodin 74,60% 50,8% 76,10% 58,40% 79,30% 67,60%
detecting-ai.com 0% 100% 0% 99,77% 0% 100%
freeaitextclassifier 0% 0% 0% 0% 0%
contentatscale human undecided human undecided human AI
corrector.app 21,09% 53% 98,60% 100% 73,73% 88,43%
plagiarismdetector.net 0% 0% 0% 0% 0% 0%
plag.fr 46% 10% 81% 24% 61% 9%
scribbr.fr n/a 100% n/a 100% n/a 100%
neuralwriter.com 10% 70% 10% 5% 30% 10%

Scribbr and Copyleaks fare best on the English version, detecting English content created by ChatGPT 3 times out of 3. Detecting-ai.com and Copyleaks do almost as well on the English version. Still, they need help to analyze content written in French.

If you’re looking for an AI detector that works in both languages, Quillbot offers the best compromise for this test.

At the other end of the spectrum, plagiarismdetector.net and freetextclassifier don’t detect anything and should be avoided, whatever the language.

Other AI detectors fare only moderately well and make errors of varying importance.

Detecting texts written by humans

The second part of the test aims to correctly detect texts written by a human (me in this case). This is not a task for generative AI. The values sought in the table below are, therefore, 0% in each column.

Freeaitextclassifier returned errors and could not be tested.

Here are the results.

7 (FR) 8 (EN) 9 (FR) 10 (EN) 11 (FR) 12 (EN)
Quillbot 0% 0% 0% 0% 0% 0%
Copyleaks n/a 50% n/a 0% n/a 0%
Smodin 9,70% 9% 1,90% 12,80% 27% 0%
detecting-ai.com 0% 0% 0% 0% 0% 0%
freeaitextclassifier error error error error error error
contentatscale human human human human human human
corrector.app 0% 0% 20,42% 24,27% 9,58% 1,92%
plagiarismdetector.net 0% 0% 0% 0% 0% 0%
plag.fr 3% 13% 6% 11% 6% 5%
scribbr.fr n/a 24% n/a 0% n/a 16%
neuralwriter.com 10% 30% 5% 20% 30% 10%

 

Several generative AI detection tools perform flawlessly: Quillbot, detecting-ai.com, contenscale, plagiarismdetector.net. Copyleaks, which only processes English in its free version, gets it wrong once by attributing half of text 8 to generative AI.


Generally, generative AI detection tools get it wrong less often regarding human-written content.


Final results

To determine the winner(s) of this test, it is, of course, necessary to take into account the 2 tests. It’s not enough to detect a text written by an AI. It’s also important not to attribute a text written by a human to the AI. As the test is in 2 languages (French and English), the free version of the tool must be able to handle both languages.

Quillbot is the clear winner of this test. This AI detector performs flawlessly on content written by a human and is nearly flawless on content written by ChatGPT (average of 96% detection on the 6 texts provided).

I want to mention Copyleaks, which made just one error but correctly identified the other English content. As a reminder, the paid version of Copyleaks handles French.

All the other tools make mistakes to a greater or lesser extent, but what I’ve noticed is that, in general, generative AI detection tools get it wrong less often when it comes to content written by a human. This is quite paradoxical.

Two sites should be avoided because they are systematically wrong: freeaitextclassifier (which also only works sometimes) and plagiarismdetector.net.

List of generative AI detection tools tested



Posted in Data & IT.

Post your opinion

Your email address will not be published. Required fields are marked *