April 12, 2026·Guide·7 min read

How do you know an AI detector actually works?

The RAID benchmark, why it matters, and where GPTypo lands on the public leaderboard.

Every AI detector claims to be accurate. Browse a few vendor sites and you'll see numbers like "99% accuracy" repeated as if they all mean the same thing. They don't. Most of those figures come from the vendor's own internal test set, a dataset they assembled, labeled, and scored themselves. It's the equivalent of a student grading their own exam.

There's a better way to evaluate AI detectors, and it's been available since 2024. It's called RAID, and if a detector hasn't been evaluated on it, you should ask why.

What RAID is

RAID (Robust AI Detection) is an academic benchmark published at ACL 2024 by researchers at the University of Pennsylvania. It's the largest and most comprehensive evaluation dataset for AI-generated text detection that exists, with over 10 million documents across eleven different language models, eleven genres, and twelve adversarial attack types.

A good benchmark has to test three things:

Coverage across models. A detector that only catches GPT-4 output is useless if your text was written with Claude or Gemini. RAID includes samples from eleven different LLMs, including open-source models like Llama and Mistral alongside commercial ones.
Coverage across domains. Formal academic prose, casual Reddit posts, news articles, creative writing, code, recipes, and more. A detector that works on essays but fails on emails isn't really working.
Adversarial robustness. RAID tests twelve attack types designed to fool detectors: paraphrasing, synonym swaps, homoglyph substitution, whitespace injection, and nine others. If a detector can be broken with a single paraphrase pass, it was never really reliable to begin with.

The problem with most accuracy claims is that they don't account for false accusations. A detector can reach "99% accuracy" just by flagging almost everything as AI, including all the real human writing it sees. Accuracy only means something when it's measured alongside a rule about how often the detector is allowed to be wrong about humans.

RAID enforces the same rule on every detector it scores: no more than 5% of real human writing can be wrongly flagged as AI. The leaderboard score is the share of AI writing each detector correctly catches while staying inside that rule. Same constraint for everyone, measuring the thing that actually matters: catching AI without falsely accusing real people.

Why the public leaderboard matters

RAID maintains a public leaderboard where detector developers can submit predictions on the hidden test set and see their score ranked against every other submission. The key word is hidden: you can't train on the test data, you can't cherry-pick which samples to evaluate against, and you can't quietly revise your numbers after the fact. The benchmark authors score your submission and publish the result.

This is the single biggest difference between a vendor's marketing claim and a real accuracy measurement. Marketing claims can be cherry-picked. Leaderboard results can't.

A useful exercise: go to any AI detector's homepage, find their accuracy claim, then search the RAID leaderboard for their name. If they're not on the leaderboard, their accuracy claim is a number they produced themselves, without independent verification.

What most detectors don't want you to know

There's a reason most detectors haven't submitted to RAID. Detection is much harder than the marketing makes it sound, and the drop from internal benchmarks to RAID is often dramatic.

A detector trained on a small in-house corpus of GPT-4 essays can honestly report 99% accuracy on that corpus. Run the same detector on RAID, with its eleven models and twelve adversarial attack types, and the number can drop by twenty or thirty points. When the number drops, the benchmark is doing what it was designed to do: exposing what the detector actually generalizes to.

Publishing a result on RAID means accepting that gap publicly. Many detector vendors choose not to.

Where GPTypo lands

We submitted GPTypo's detection model to the RAID leaderboard in April 2026. The submission was accepted and merged into the official repository, and it's publicly viewable on the leaderboard today.

Our results:

98.2% accuracy at 5% false positive rate on the non-adversarial subset
91.6% accuracy across all conditions, including the twelve adversarial attack types
98.4% AUROC across the full test set

For context on where that sits, GPTZero (one of the most well-known commercial detectors on the market) scores 98.37% on the same benchmark. We're within 0.2 points. Grammarly leads the public board at 99.91%. Those are the names most readers recognize, and we're glad to be sitting next to them on a test none of us graded ourselves.

We think those comparisons are the honest way to present detection accuracy. No "99%" claim in a vacuum. Just the public result, next to every other public result, on the same test.

What the numbers mean for your writing

RAID results translate into something practical: how often the detector gets your specific piece of text right.

At 5% false positive rate, roughly 1 in 20 pieces of genuinely human writing will still get flagged as AI. That's not zero, and it's why GPTypo's workflow exists. Even with a strong detector, false positives happen. The response to a false positive shouldn't be "trust the detector," it should be "see what the detector is reacting to and decide whether to change it."

That's why GPTypo shows you why a sentence got flagged, not just that it got flagged. A sentence flagged for low burstiness and vocabulary overlap is a sentence you can adjust. A sentence flagged for no clear reason is a sentence you can keep.

What to ask any AI detector

If you're evaluating detectors (for a school, a publication, a content team, or your own writing), three questions separate the real products from the marketing-driven ones:

Are you on RAID? If not, why not? The benchmark has been public since 2024 and submissions are free.
What's your accuracy at 5% false positive rate? Overall accuracy without a fixed FPR isn't comparable across detectors. Any number quoted without its FPR context is a red flag.
How do you perform against adversarial attacks? Most real-world evasion looks like a paraphrase pass. If a detector hasn't been tested against adversarial inputs, assume it breaks on them.

You don't have to take our word for any of this. The RAID leaderboard is public. Our submission is there. So is GPTZero's, Grammarly's, and every other detector that's willing to be measured the same way.

Why we built it this way

When we started GPTypo, the AI detection market was full of unverifiable claims. Every site had a different 99% number and no way to compare them. Our goal from the beginning was to build a detector that worked in the way we said it worked, and to make that verifiable.

Submitting to RAID was the natural extension of that. The leaderboard result is a reference point, not a marketing campaign. The next time a detector tells you it's 99% accurate, you'll have somewhere to look it up.

If you want to try the detector that produced these numbers, paste some text into GPTypo and click Verify (the Pro-tier neural scan). That's the same model we submitted to RAID, running on the same infrastructure, producing the same scores the leaderboard measured. The free tier's heuristic scorer is useful for a first pass, but Verify is the one with the public benchmark behind it.