Methodology

Detection Accuracy

We publish our accuracy data so you can make informed decisions. Every number on this page comes from automated testing against a labeled corpus of known human and AI-generated text.

Last updated: May 15, 20268,719 samples tested

Pro Tier — Neural Model

AI text correctly flagged
99.8%
4476 of 4485 samples
Human text correctly passed
99.5%
4211 of 4234 samples
False positive rate
0.5%
23 human samples flagged
AI missed
9
of 4485 AI samples

The Pro tier uses a fine-tuned neural model trained on 26,000+ samples across 11 AI models, 8 domains, and 12 adversarial attack types. It achieves near-perfect AI detection (99.8%) while correctly passing 99.5% of human text. Independently evaluated on the RAID benchmark (ACL 2024) at 99.24% accuracy across 672,000 test documents.

Pro vs Free Tier Comparison

MetricPro (Neural)Free (Heuristic)
AI correctly flagged99.8%23.1%
Human correctly passed99.5%68.9%
False positive rate0.5%31.1%
Human avg score99.577.9

Free Tier — Heuristic Scanner

Human text correctly passed
68.9%
874 of 4234 samples
AI text correctly flagged
23.1%
661 of 4485 samples
False positive rate
31.1%
394 human samples flagged
Human avg score
77.9
AI avg: 81.7

The free tier uses a 5-signal heuristic scorer that checks burstiness, AI vocabulary, perplexity, structural patterns, and human texture markers. It provides a fast initial estimate but has limited accuracy — especially on formal writing where human and AI styles overlap. For reliable results, upgrade to the Pro neural model.

Human Text — By Source

How often does each type of human writing pass without being falsely flagged?

SourceSamplesPro (Neural)Free (Heuristic)
Creative fiction92599.4%80.6%
Academic discussions81499.9%
Academic papers69899.7%51.8%
Casual online writing64598.9%69.4%
Clinical / medical54799.8%75.8%
Non-native English54499.3%77.6%
Articles / essays6196.7%50.8%

Formal writing (academic papers, clinical text) has higher false positive rates across all AI detectors — formal prose shares structural features with AI output. Auto tone detection adjusts thresholds for detected domains (Academic, Medical, Legal, etc.) to reduce these false positives.

AI Text — By Model

How often does AI-generated text from each model get correctly flagged?

AI ModelSamplesPro (Neural)Free (Heuristic)
AI Model A75099.3%10.1%
AI Model B75099.9%36.5%
AI Model C75099.7%23.1%
AI Model D540100%21.5%
AI Model E540100%
AI Model G540100%
AI Model F540100%
AI (mixed models)7598.7%29.3%

Methodology

Corpus

8,719 labeled text samples: 4,234 known human and 4,485 known AI-generated. Human samples span casual writing, academic papers, clinical abstracts, creative fiction, and non-native English. AI samples are generated by multiple leading models across varied writing styles and topics.

Date filtering

All human samples are sourced from before January 2023 to ensure they predate widespread LLM usage. This gives us high confidence that "human" labels are accurate.

Models

The Pro neural model is a fine-tuned transformer trained on a curated corpus of verified human and AI-generated text. The free heuristic scorer uses five additive signals: sentence length variance (burstiness), AI vocabulary density, perplexity proxy, structural patterns, and human texture markers.

Scoring

Each sample is scored on a 0–100 scale (higher = more human). Samples scoring above 75 are classified as "Human," 46–75 as "Mixed," and 45 or below as "AI Detected." A human sample is "correctly passed" if it scores above 75. An AI sample is "correctly flagged" if it scores 75 or below.

Known limitations

The corpus skews toward English-language text. Adversarial attacks (zero-width characters, homoglyphs) can reduce detection accuracy. AI writing styles evolve; we re-run this benchmark periodically and retrain the model as needed.

This data is generated automatically and updates when we retrain or expand our test corpus. We believe publishing accuracy data — including where we fall short — builds more trust than marketing claims.