How Effective are AI Content Detectors? Exposed!

AI Content Writing August 28, 2023 0 comments

AI has been the talk of the town for the past few months. Thanks to generative AI like ChatGPT and Bard, the digital community is going through a revolution. The creative capacity of humans – something that is always cherished as exclusive – that creativity is being imitated by LLM-based AI tools.

Today, AI is creating poems, writing blogs, making pictures and videos, and writing full-length codes to create a website. Truly, AI has had a disruptive effect on the market. It has polarized the world. While a section of the population is heavily relying on ChatGPT to get comprehensive information, others are sceptical of its usage.

Table of Contents

Repercussions of AI for Enterprises

Enterprises are prohibiting the usage of such AI language models over concerns of privacy, potential security breaches, intellectual property theft, and copyright violation – the threats are numerous. Powered by a self-learning neural network, GPT models can potentially use copyrighted materials in its response without proper attributions. Allegations over Open AI violating copyrights to train its language model have already landed the company into multiple lawsuits.

As blogs and articles, most AI-generated content is too generic and may be flagged as thin content by search engine crawlers. However, that is only the tip of the iceberg; there may be factual as well as statistical inaccuracies in the content produced by such LLM-based tools. This is known as AI hallucination – one of the primary reasons why enterprises are shunning its usage. Consequently, we are noticing an increase in the need to differentiate human generated vs. AI-generated content.

Detecting AI Content

Since AI responses are similar to humans, it is virtually impossible to distinguish human write-ups and AI-generated writings. According to a study by langedutech journal, aperson can use two major techniques to differentiate AI and human-generated text. They are –

1. Stylometric Analysis

2. Metadata Analysis

Stylometry compares a text on criteria like vocabulary, syntax and grammar to identify if it is written by humans or AI. On the other hand, metadata analysis traces the originating location of the content to deduce whether it is human or AI-generated.

But both of these methods require professional skills and advanced technological expertise. Even then the results are not 100% authentic.

AI Content Detector Tools: Fact vs. Myth

The third option is to use AI-based GPT content detector tools, which is quite the irony! We are using an AI tool to identify if a text is generated by another AI! The Internet is flooded with multiple AI checker tools which claim to distinguish human write-ups from AI texts.

So, we determined to put a few of those tools to the test. We fed one of our blogs into two AI content detector tools; and here are the results,

The first tool rightly identified the content as human written. However, when we used another tool, it showed the opposite result,

The same text is now being flagged as AI-generated. Conversely, we made ChatGPT 3.5 write a similar blog on the same topic. This time, we removed a few redundant phrases and pasted the same article on the respective AI tools.

Here are the results,

The first tool identified ChatGPT 3.5’s content as a human-generated one, while the second tool flagged it as AI-generated content. If we take into account the percentage, the AI content detector is more certain of the human generated content as actually AI generated – 62 per cent. Ironically, it is less certain of the ChatGPT produced article and only gives it 58 per cent.

It is this questionable accuracy of AI detector tools that led OpenAI, the founding organisation of ChatGPT, to withdraw their AI content detection tool.

We have also created a blog on the future of ChatGPT in content writing, so if you want more on that you can click here.

Analysing the Results

First of all, as we pointed out above, AI or GPT detector tools can be highly erroneous. Even at its best, the tools will have room for errors. Research done by Cornell University suggests that AI detection tools tend to be biased against non-native English speakers. This is because most AI detector tools use text perplexity to identify if the writing is human or AI-generated.

Text Perplexity: Explained

As per the research findings by Stanford University scholars, most AI detector tools use something called “text perplexity” to identify if a write-up is written by generative AI or not. Text perplexity –

Identifies the sequence of words in a write-up.
Analyses how difficult they are to predict by generative AI.
Higher predictability is considered to be AI generated.
Lower predictability is considered as human generated.

Hence this is why the GPT detector tools tend to flag the write-ups of non-native English writers. The reason is,

Non-native speakers tend to know less number of words
They use few complex words and phrases and writes short sentences.
They have little knowledge about sentence variations.

Putting the Theory to Practice

Now let us take the following scenario, where an experienced writer is writing an article on a technical topic such as ‘Home Loan Interest’. Some common characteristics of such an article is –

Little potential for creativity.
Short sentences with straightforward syntax.
Predictable sequence of words.
Technical jargons with multiple pointers.

Such an article is easily flagged as AI generated since the sequence of words is highly predictable by the GPT models.

That explains why the LLM-based AI detection tool fails to recognise AI-generated content. As we went on to remove certain lines and sections from the ChatGPT generated blog, the text perplexity became higher; its predictions about the sequence of next words turned out invalid and hence it identified the blog as a human-generated text.

How Does LLM Work?

ChatGPT and other Generative Artificial Intelligence use something called an LLM (large language model) algorithm. That means they can produce a large volume of texts containing information, data and figures that go into an informative piece. However, where do they get that information from? Well, generative AI uses a deep learning model to extract information from public forums like Wikipedia and other web repositories.

How is LLM Trained?

The AI language models are trained on millions of inputs already available in public forums such as –

News
Articles
Reports
Blogs
Published books
Research reports

These source materials were created by humans; so, naturally, besides learning the information, the LLM-based AI models are also learning the way humans use language. So, the AI language model replicates that pattern to produce comprehensible texts and materials.

The Results

Hence, the extreme similarity between AI and human-generated content. The matter of concern concern is AI language models are trained on deep learning neural networks. These machine learning models simulate the human brain in learning new information. They continue to evolve by themselves without any external inputs. So, the little mistakes, or lapses in writing are only going to get further rectified with future updates of the generative language models.

Afterword

With the findings of our little experiment and existing research literature, it is safe to deduce that AI content detector tools are flawed and require human intervention if it is to be trusted.

Moreover, sometimes the tools do not even have access to the sophistication and advanced resources that go into producing ChatGPT’s answers. So, chances remain high that either most tools fail to identify AI-generated content. Or worse, the tools might penalise humans by flagging their fruits of labour as AI-generated text.