Detection Accuracy
We publish our accuracy data so you can make informed decisions. Every number on this page comes from automated testing against a labeled corpus of known human and AI-generated text.
Pro Tier — Neural Model
The Pro tier uses a fine-tuned neural model trained on 26,000+ samples across 11 AI models, 8 domains, and 12 adversarial attack types. It achieves near-perfect AI detection (99.8%) while correctly passing 99.5% of human text. Independently evaluated on the RAID benchmark (ACL 2024) at 99.24% accuracy across 672,000 test documents.
Pro vs Free Tier Comparison
| Metric | Pro (Neural) | Free (Heuristic) |
|---|---|---|
| AI correctly flagged | 99.8% | 23.1% |
| Human correctly passed | 99.5% | 68.9% |
| False positive rate | 0.5% | 31.1% |
| Human avg score | 99.5 | 77.9 |
Free Tier — Heuristic Scanner
The free tier uses a 5-signal heuristic scorer that checks burstiness, AI vocabulary, perplexity, structural patterns, and human texture markers. It provides a fast initial estimate but has limited accuracy — especially on formal writing where human and AI styles overlap. For reliable results, upgrade to the Pro neural model.
Human Text — By Source
How often does each type of human writing pass without being falsely flagged?
| Source | Samples | Pro (Neural) | Free (Heuristic) |
|---|---|---|---|
| Creative fiction | 925 | 99.4% | 80.6% |
| Academic discussions | 814 | 99.9% | — |
| Academic papers | 698 | 99.7% | 51.8% |
| Casual online writing | 645 | 98.9% | 69.4% |
| Clinical / medical | 547 | 99.8% | 75.8% |
| Non-native English | 544 | 99.3% | 77.6% |
| Articles / essays | 61 | 96.7% | 50.8% |
Formal writing (academic papers, clinical text) has higher false positive rates across all AI detectors — formal prose shares structural features with AI output. Auto tone detection adjusts thresholds for detected domains (Academic, Medical, Legal, etc.) to reduce these false positives.
AI Text — By Model
How often does AI-generated text from each model get correctly flagged?
| AI Model | Samples | Pro (Neural) | Free (Heuristic) |
|---|---|---|---|
| AI Model A | 750 | 99.3% | 10.1% |
| AI Model B | 750 | 99.9% | 36.5% |
| AI Model C | 750 | 99.7% | 23.1% |
| AI Model D | 540 | 100% | 21.5% |
| AI Model E | 540 | 100% | — |
| AI Model G | 540 | 100% | — |
| AI Model F | 540 | 100% | — |
| AI (mixed models) | 75 | 98.7% | 29.3% |
Methodology
Corpus
8,719 labeled text samples: 4,234 known human and 4,485 known AI-generated. Human samples span casual writing, academic papers, clinical abstracts, creative fiction, and non-native English. AI samples are generated by multiple leading models across varied writing styles and topics.
Date filtering
All human samples are sourced from before January 2023 to ensure they predate widespread LLM usage. This gives us high confidence that "human" labels are accurate.
Models
The Pro neural model is a fine-tuned transformer trained on a curated corpus of verified human and AI-generated text. The free heuristic scorer uses five additive signals: sentence length variance (burstiness), AI vocabulary density, perplexity proxy, structural patterns, and human texture markers.
Scoring
Each sample is scored on a 0–100 scale (higher = more human). Samples scoring above 75 are classified as "Human," 46–75 as "Mixed," and 45 or below as "AI Detected." A human sample is "correctly passed" if it scores above 75. An AI sample is "correctly flagged" if it scores 75 or below.
Known limitations
The corpus skews toward English-language text. Adversarial attacks (zero-width characters, homoglyphs) can reduce detection accuracy. AI writing styles evolve; we re-run this benchmark periodically and retrain the model as needed.
This data is generated automatically and updates when we retrain or expand our test corpus. We believe publishing accuracy data — including where we fall short — builds more trust than marketing claims.