Methodology

Detection Accuracy

We publish our accuracy data so you can make informed decisions. Every number on this page comes from automated testing against a labeled corpus of known human and AI-generated text.

Last updated: May 15, 20268,719 samples tested

Pro Tier — Neural Model

AI text correctly flagged

99.8%

4476 of 4485 samples

Human text correctly passed

99.5%

4211 of 4234 samples

False positive rate

0.5%

23 human samples flagged

AI missed

of 4485 AI samples

The Pro tier uses a fine-tuned neural model trained on 26,000+ samples across 11 AI models, 8 domains, and 12 adversarial attack types. It achieves near-perfect AI detection (99.8%) while correctly passing 99.5% of human text. Independently evaluated on the RAID benchmark (ACL 2024) at 99.24% accuracy across 672,000 test documents.

Pro vs Free Tier Comparison

Metric	Pro (Neural)	Free (Heuristic)
AI correctly flagged	99.8%	23.1%
Human correctly passed	99.5%	68.9%
False positive rate	0.5%	31.1%
Human avg score	99.5	77.9

Free Tier — Heuristic Scanner

Human text correctly passed

68.9%

874 of 4234 samples

AI text correctly flagged

23.1%

661 of 4485 samples

False positive rate

31.1%

394 human samples flagged

Human avg score

77.9

AI avg: 81.7

The free tier uses a 5-signal heuristic scorer that checks burstiness, AI vocabulary, perplexity, structural patterns, and human texture markers. It provides a fast initial estimate but has limited accuracy — especially on formal writing where human and AI styles overlap. For reliable results, upgrade to the Pro neural model.

Human Text — By Source

How often does each type of human writing pass without being falsely flagged?

Source	Samples	Pro (Neural)	Free (Heuristic)
Creative fiction	925	99.4%	80.6%
Academic discussions	814	99.9%	—
Academic papers	698	99.7%	51.8%
Casual online writing	645	98.9%	69.4%
Clinical / medical	547	99.8%	75.8%
Non-native English	544	99.3%	77.6%
Articles / essays	61	96.7%	50.8%

Formal writing (academic papers, clinical text) has higher false positive rates across all AI detectors — formal prose shares structural features with AI output. Auto tone detection adjusts thresholds for detected domains (Academic, Medical, Legal, etc.) to reduce these false positives.

AI Text — By Model

How often does AI-generated text from each model get correctly flagged?

AI Model	Samples	Pro (Neural)	Free (Heuristic)
AI Model A	750	99.3%	10.1%
AI Model B	750	99.9%	36.5%
AI Model C	750	99.7%	23.1%
AI Model D	540	100%	21.5%
AI Model E	540	100%	—
AI Model G	540	100%	—
AI Model F	540	100%	—
AI (mixed models)	75	98.7%	29.3%

Methodology

Corpus

8,719 labeled text samples: 4,234 known human and 4,485 known AI-generated. Human samples span casual writing, academic papers, clinical abstracts, creative fiction, and non-native English. AI samples are generated by multiple leading models across varied writing styles and topics.

Date filtering

All human samples are sourced from before January 2023 to ensure they predate widespread LLM usage. This gives us high confidence that "human" labels are accurate.

Models

The Pro neural model is a fine-tuned transformer trained on a curated corpus of verified human and AI-generated text. The free heuristic scorer uses five additive signals: sentence length variance (burstiness), AI vocabulary density, perplexity proxy, structural patterns, and human texture markers.

Scoring

Each sample is scored on a 0–100 scale (higher = more human). Samples scoring above 75 are classified as "Human," 46–75 as "Mixed," and 45 or below as "AI Detected." A human sample is "correctly passed" if it scores above 75. An AI sample is "correctly flagged" if it scores 75 or below.

Known limitations

The corpus skews toward English-language text. Adversarial attacks (zero-width characters, homoglyphs) can reduce detection accuracy. AI writing styles evolve; we re-run this benchmark periodically and retrain the model as needed.

This data is generated automatically and updates when we retrain or expand our test corpus. We believe publishing accuracy data — including where we fall short — builds more trust than marketing claims.