Methodology

Detection Accuracy

We publish our accuracy data so you can make informed decisions. Every number on this page comes from automated testing against a labeled corpus of known human and AI-generated text.

Last updated: April 4, 20264,133 samples tested

Pro Tier — Neural Model

AI text correctly flagged
99.9%
2863 of 2865 samples
Human text correctly passed
82.1%
1041 of 1268 samples
False positive rate
17.9%
227 human samples flagged
AI missed
2
of 2865 AI samples

The Pro tier uses a fine-tuned neural model purpose-built for AI detection. It achieves near-perfect AI detection (99.9%) while correctly passing 82.1% of human text. The main source of false positives is formal academic and clinical writing, where human prose closely resembles AI output.

Pro vs Free Tier Comparison

MetricPro (Neural)Free (Heuristic)
AI correctly flagged99.9%23.1%
Human correctly passed82.1%68.9%
False positive rate17.9%31.1%
Human avg score84.177.9

Free Tier — Heuristic Scanner

Human text correctly passed
68.9%
874 of 1268 samples
AI text correctly flagged
23.1%
661 of 2865 samples
False positive rate
31.1%
394 human samples flagged
Human avg score
77.9
AI avg: 81.7

The free tier uses a 5-signal heuristic scorer that checks burstiness, AI vocabulary, perplexity, structural patterns, and human texture markers. It provides a fast initial estimate but has limited accuracy — especially on formal writing where human and AI styles overlap. For reliable results, upgrade to the Pro neural model.

Human Text — By Source

How often does each type of human writing pass without being falsely flagged?

SourceSamplesPro (Neural)Free (Heuristic)
Casual online writing44799.3%69.4%
Academic papers22663.7%51.8%
Non-native English19695.4%77.6%
Clinical / medical17834.3%75.8%
Creative fiction16090%80.6%
Articles / essays61100%50.8%

Formal writing (academic papers, clinical text) has higher false positive rates across all AI detectors — formal prose shares structural features with AI output. Auto tone detection adjusts thresholds for detected domains (Academic, Medical, Legal, etc.) to reduce these false positives.

AI Text — By Model

How often does AI-generated text from each model get correctly flagged?

AI ModelSamplesPro (Neural)Free (Heuristic)
AI Model A750100%10.1%
AI Model B750100%36.5%
AI Model C75099.9%23.1%
AI Model D540100%21.5%
AI (mixed models)7598.7%29.3%

Methodology

Corpus

4,133 labeled text samples: 1,268 known human and 2,865 known AI-generated. Human samples span casual writing, academic papers, clinical abstracts, creative fiction, and non-native English. AI samples are generated by multiple leading models across varied writing styles and topics.

Date filtering

All human samples are sourced from before January 2023 to ensure they predate widespread LLM usage. This gives us high confidence that "human" labels are accurate.

Models

The Pro neural model is a fine-tuned transformer trained on a curated corpus of verified human and AI-generated text. The free heuristic scorer uses five additive signals: sentence length variance (burstiness), AI vocabulary density, perplexity proxy, structural patterns, and human texture markers.

Scoring

Each sample is scored on a 0–100 scale (higher = more human). Samples scoring above 75 are classified as "Human," 46–75 as "Mixed," and 45 or below as "AI Detected." A human sample is "correctly passed" if it scores above 75. An AI sample is "correctly flagged" if it scores 75 or below.

Known limitations

The neural model struggles with formal academic and clinical writing, where human prose closely resembles AI output. The corpus skews toward English-language text. AI writing styles evolve; we re-run this benchmark periodically and retrain the model as needed.

This data is generated automatically and updates when we retrain or expand our test corpus. We believe publishing accuracy data — including where we fall short — builds more trust than marketing claims.