How Do AI Detectors Work? The Technology Explained (2026)
How do AI detectors work? They analyze text for statistical patterns that distinguish human writing from AI-generated content — primarily measuring perplexity (how predictable the word choices are) and burstiness (how much sentence length and complexity vary). But knowing what they measure is only half the picture. Scribbr tested 10 popular detectors and found an average accuracy of just 60%. Understanding the technology explains both how detectors catch AI text and why they fail so often.
How AI Generates Text (The Foundation You Need First)
Before you can understand how AI detectors work, you need to understand what they're detecting. Every AI writing tool — ChatGPT, Claude, Gemini, DeepSeek — generates text the same fundamental way: one word at a time, left to right, by predicting the most probable next word.
This process is called autoregressive generation. The model looks at all the words so far and calculates a probability distribution over every possible next word in its vocabulary (typically 50,000-100,000 tokens). Then it picks one. Then it repeats, using the newly generated word as additional context for the next prediction.
Here's what matters for detection: the model tends to pick high-probability words. If you type "The cat sat on the," the model assigns the highest probability to words like "mat," "floor," "chair." It almost never picks "xylophone" or "parliament." This makes AI text statistically predictable in a way that human text isn't.
Humans don't write by computing probability distributions. We make weird word choices. We start sentences one way and change direction mid-thought. We use obscure vocabulary because we read it somewhere last week, or because it sounds funny, or because we're showing off. This unpredictability creates a statistical fingerprint that's hard for AI to replicate consistently.
The setting that controls how "creative" an AI model gets is called temperature. At low temperature (0.1-0.3), the model almost always picks the single most probable next word — producing extremely predictable, detectable text. At high temperature (0.8-1.2), the model considers less likely options more seriously — producing more varied but sometimes incoherent text. Most default chat interfaces run at moderate temperature, which means the output is predictable enough for detectors to catch but varied enough to read naturally.
This is the fundamental insight that makes AI detection possible: AI text is generated through a process that favors high-probability sequences, and that tendency leaves measurable traces.
Perplexity — Why AI Text Is Predictably Boring
Perplexity is the single most important metric in AI detection. It measures how "surprised" a language model would be by a sequence of words. Think of it as a predictability score.
Low perplexity means the text follows highly expected patterns. Each word is roughly what you'd predict given the words before it. Sentences flow in safe, conventional sequences. This is what AI produces most of the time — text that is technically correct, grammatically clean, and deeply unsurprising.
High perplexity means the text contains unexpected choices. An unusual metaphor. A sentence that starts with a conjunction when you expected a noun. A technical term dropped into casual conversation. Humans do this constantly without thinking about it.
Detectors measure perplexity by running the text through a reference language model (often a variant of GPT-2 or a similar open-source model) and checking how well the model predicts each word. If the reference model predicts the words easily — low perplexity — the text likely came from a similar model. If the reference model struggles to predict the words — high perplexity — the text likely came from a human.
The problem with perplexity as a detection metric is that not all human writing has high perplexity. Technical writing, legal documents, scientific papers, and formulaic academic essays follow predictable patterns by design. A well-written five-paragraph essay with clear topic sentences and orderly transitions will score low on perplexity — not because an AI wrote it, but because the student followed the structure their professor taught them.
Info
Perplexity measures how predictable text is — low perplexity suggests AI, high perplexity suggests human. The flaw: clear, well-structured human writing (academic essays, technical documents, legal briefs) scores low on perplexity by design, which is why these styles trigger the most false positives across every major detector.
This is what makes why human writing triggers AI detectors such a persistent problem. The better you follow conventional writing rules, the more your text looks like AI to a perplexity-based detector.
Burstiness — Why AI Text Sounds Monotonous
Burstiness measures variation in sentence complexity and length across a piece of text. It's the second pillar of AI detection, and it catches something perplexity misses: rhythm.
Human writing is bursty. We write a long, complex sentence packed with clauses and qualifications — then follow it with a short punch. Three words. Then a medium sentence that transitions to the next idea. This variation happens naturally because humans think in uneven bursts, not in metronomic patterns.
AI writing tends toward uniformity. Sentences cluster around similar lengths. Complexity stays consistent paragraph to paragraph. The text hums along at a steady, even pace — competent but flat. This happens because autoregressive generation optimizes for local coherence (each sentence sounds good after the previous one) rather than global rhythm (the document has varied pacing).
Detectors measure burstiness by calculating the variance in sentence length and syntactic complexity across the full document. Low variance (uniform sentences) correlates with AI origin. High variance (mixed long and short sentences) correlates with human authorship.
Burstiness is harder to fake than perplexity. You can increase perplexity by swapping in unusual synonyms — but creating genuinely varied rhythm requires the kind of structural awareness that current language models don't optimize for. It's also why AI-generated text often "sounds" flat to experienced readers even when they can't pinpoint exactly why.
Combined, perplexity and burstiness create a two-dimensional map. Human writing tends to cluster in the high-perplexity, high-burstiness quadrant. AI writing clusters in the low-perplexity, low-burstiness quadrant. The contested space is everything in between — where detection gets unreliable and false positives multiply.
The Three Detection Approaches (Statistical, Classifier, Watermark)
Not all AI detectors work the same way. The tools on the market use three fundamentally different approaches, which is why they routinely disagree when analyzing the same text.
1. Zero-Shot Statistical Detection
This approach doesn't require any training data. It runs the text through a reference language model and measures statistical properties — primarily perplexity and burstiness — to determine whether the text resembles AI output. Tools like GPTZero lean heavily on this method.
The advantage: it works on text from any AI model, even models it's never seen before, because it's measuring statistical properties rather than memorized patterns. The disadvantage: it's more prone to false positives on naturally predictable human writing (academic text, technical documentation, non-native English speakers using simple vocabulary).
2. Trained Classifier Detection
This approach uses machine learning. The detector is trained on large datasets of labeled examples — confirmed human text and confirmed AI text — and learns to distinguish between them based on hundreds of subtle features. Turnitin and Originality.ai use trained classifiers, often in combination with statistical methods.
The advantage: higher accuracy on text from models the classifier was trained on (typically GPT-3.5 and GPT-4). The disadvantage: accuracy drops on newer or less common models. When a new AI model launches with different writing characteristics, classifiers need retraining. This is why Turnitin catches unedited ChatGPT about 85% of the time but performs worse on Claude or Gemini output.
3. Watermarking
This approach is fundamentally different — it embeds an invisible signal in AI-generated text at the moment of generation, rather than analyzing text after the fact. Google DeepMind's SynthID is the most prominent example, embedding statistical watermarks in text, images, audio, and video.
Watermarking works by subtly biasing which words the AI chooses — nudging it toward certain token selections that create a detectable pattern invisible to human readers. The advantage: near-perfect accuracy with virtually zero false positives, because the watermark is embedded in AI text and absent from human text by definition.
The disadvantages are significant. Watermarks only work if the AI provider implements them — and as of March 2026, most don't. The watermark only survives if the text isn't edited afterward; paraphrasing, translation, or even moderate rewriting destroys the signal. And watermarks require cooperation from the companies building the AI models, which makes them a policy solution more than a technical one.
Info
Zero-shot detectors measure statistical properties and work across all models. Trained classifiers achieve higher accuracy but degrade on new models. Watermarks are near-perfect but require AI providers to implement them — and few do. No single approach solves the detection problem alone.
Why Detectors Disagree (And Why No Detector Is Reliable Alone)
Upload the same 1,000-word essay to Turnitin, GPTZero, Originality.ai, and Copyleaks, and you'll get four different scores. Sometimes wildly different. This isn't a bug in any individual tool — it's a structural consequence of how detection works.
Each detector uses a different combination of the approaches described above, trained on different datasets, with different sensitivity thresholds. GPTZero leans on zero-shot statistical analysis. Turnitin uses trained classifiers optimized for academic text. Originality.ai claims 99% but independent tests find 76-97%, partly because its Turbo model aggressively targets humanized text at the cost of higher false positives. Grammarly's AI detector scores just 33% accuracy because it's an add-on feature, not a core product.
The accuracy picture across the industry is sobering. Scribbr tested 10 popular AI detectors and found an average accuracy of 60%. The best free detector hit 68%. The best premium detector reached 84%. That means even the best tool on the market misclassifies roughly 1 in 6 texts.
GPTZero claims 99% accuracy but independent testing finds 82-90%. The gap between self-reported and independent accuracy exists across every detector, because companies test on datasets that favor their tools while independent benchmarks include adversarial cases.
The most comprehensive independent benchmark is RAID (Robust AI Detection), published at ACL 2024. It tested 12 detectors across 6 million+ AI-generated texts from 11 different models, including adversarial attacks like paraphrasing, synonym substitution, and style alteration. The findings were stark: adversarial attacks caused an average 40.6% drop in accuracy, and almost no detector maintained acceptable performance at a false positive rate below 1%.
Info
The RAID benchmark (ACL 2024) tested 12 AI detectors on 6 million+ texts across 11 models. Key finding: adversarial attacks caused an average 40.6% accuracy drop, and almost no detector maintained usable accuracy at a false positive rate below 1%. Detection is a significantly harder problem than any single tool's marketing suggests.
This is the core tension: cranking up detection sensitivity catches more AI text but also flags more human text. Dialing it back reduces false positives but lets more AI text through. Turnitin explicitly suppresses scores in the 1-19% range because false positives are too common at those levels. Every detector makes this tradeoff differently, which is why they disagree.
OpenAI learned this lesson the hard way. They launched their own AI text classifier in January 2023 and shut it down six months later — it achieved just 26% true positive rate with a 9% false positive rate. The company that builds the AI couldn't reliably detect its own output. That's not a failure of engineering. It's evidence that the detection problem is fundamentally hard.
The Arms Race: Detection vs Evasion
Detector companies don't like talking about this, but it's the defining reality of the field: AI detection and AI evasion are locked in an escalating arms race, and evasion is winning.
The cycle works like this. Detectors learn to identify statistical patterns in AI text. Humanizer tools learn to alter those patterns — varying sentence length, injecting unusual vocabulary, restructuring syntax — until the text no longer triggers detection. Detectors retrain on the humanized output. Humanizers adapt again. Each round produces marginal improvements on both sides, but the structural advantage belongs to evasion.
Why? Because destroying a statistical signal is easier than detecting one. A detector needs to identify a consistent pattern across the entire text. A humanizer only needs to introduce enough variation to break that pattern. GPTZero's detection rate drops to 18% on humanized text. That's not a failure of GPTZero specifically — every detector shows significant accuracy loss on edited or humanized content.
QuillBot gets caught by virtually every major detector when used as a simple paraphraser, but purpose-built humanizer tools are far more sophisticated. They don't just swap synonyms — they restructure sentences, vary paragraph length, inject colloquial phrasing, and add the kind of controlled irregularity that mimics human burstiness. How humanizer tools exploit detection weaknesses is well-documented, and the gap between detection and evasion continues to widen.
The watermarking approach theoretically breaks this cycle — you can't remove a signal you can't see. But text watermarks are fragile. Editing 15-20% of the words typically destroys the signal. And watermarking requires AI providers to voluntarily embed detectable patterns in their output, which creates a competitive disadvantage for any company that implements it while others don't. Google's SynthID is the most ambitious attempt, but adoption across the industry remains minimal.
What percentage of human editing makes AI text undetectable? There's no precise threshold, but research suggests a spectrum. Lightly edited AI text (fixing typos, changing a few words) still gets caught 60-80% of the time. Moderately edited text (restructuring sentences, adding personal examples, varying vocabulary) drops detection to 30-50%. Heavily rewritten text (keeping only the ideas and structure, rewriting every sentence) falls below most detection thresholds entirely. The line between "AI-assisted" and "human-written" is a gradient, not a boundary.
Who Gets Caught and Who Gets Wrongly Flagged
AI detectors don't fail randomly. They fail along predictable lines that track with writing style, language background, and neurodivergence — creating a systematic bias that affects specific populations more than others.
Non-native English speakers are the most documented victims of AI detection bias. Stanford researchers found that 61.22% of TOEFL essays written by non-native speakers were falsely flagged as AI-generated. Across seven detectors tested, 97% of those essays were flagged by at least one tool. The full study by Liang et al. explains why: non-native speakers tend to use simpler vocabulary, shorter sentences, and more predictable structures — exactly the low-perplexity, low-burstiness profile that detectors associate with AI.
This isn't a fixable calibration issue. It's a structural conflict: the features that detectors use to identify AI text overlap significantly with the features of non-native English writing. You can't tune the detector to stop flagging ESL students without simultaneously letting more AI text through.
Students who follow academic writing conventions face a related paradox. Five-paragraph essays, clear topic sentences, structured arguments, consistent tone — the patterns professors teach students to use are the same patterns AI models produce. A student who masters academic writing conventions inadvertently optimizes their text to look AI-generated. Higher-graded essays, paradoxically, may be flagged more often than messy, disorganized ones.
Professors use software, manual analysis, and oral exams to evaluate AI suspicion, but the software component carries the biases described above. Most LMS platforms offer no native AI detection — Canvas itself has zero AI detection capability, and Blackboard relies on SafeAssign and Turnitin integration. Wherever Turnitin goes, the false positive problem follows.
Neurodivergent students — particularly those with autism or ADHD — sometimes produce writing with unusually consistent patterns, limited stylistic variation, or heavily edited prose that removes all rough edges. These characteristics overlap with AI detection markers in ways that have real legal consequences. A UK Office of the Independent Adjudicator upheld an appeal from an autistic student falsely flagged by AI detection, and a Yale student sued their university after being falsely accused based on GPTZero scores.
The false positive crisis across all detectors isn't an edge case. It's a systematic problem that compounds whenever detection is used without human judgment. Turnitin flags QuillBot-paraphrased text about 70% of the time, but that same sensitivity means it also flags human text that happens to share structural characteristics with paraphrased content.
The responsible use of AI detectors requires treating every score as a starting point for investigation — never as a verdict. The technology can identify text that's worth examining, but it cannot determine authorship with the certainty that academic integrity decisions demand.