Rarity Roulette

Try a Real-World Example

Population Size

Base Rate (Prevalence) ?

1 in

Prevalence: 0.01%

Test Accuracy (True Positive Rate) 75% ?

1%20%40%60%80%100%

Detection Threshold 50% ?

0.1%20%40%60%80%90%

View legacy table version

Adjust parameters and click "Generate Results"

Frequency Tree: Follow the Numbers

THE KEY QUESTION: If someone tests positive, what's the chance they actually have the condition?

—

Results Summary

How to Read the Tree

Level 1: Start with the whole population at the top.

Level 2: The population splits based on who actually has the condition (left, red border) versus who doesn't (right, green border). This split is determined by the base rate.

Level 3: Each group then splits based on test results. Those with the condition may test positive (true positives) or negative (false negatives). Those without may test positive (false positives) or negative (true negatives).

The critical insight: To find the probability that a positive test is correct, look at all positive tests — that's true positives plus false positives. The answer is: true positives ÷ (true positives + false positives).

How This Works

This tool corrects for base rate neglect (also called base rate bias or the base rate fallacy) - a common cognitive bias where people ignore how the prevalence of a problem affects test outcomes.

The simulator applies Bayes' theorem to show you four outcome categories instead of just one accuracy number. It displays results in frequency format (actual counts of people) rather than percentages, which helps our intuitions work better when reasoning about probabilities.

Even highly accurate tests can produce more false positives than true positives when screening for rare conditions. This has profound implications for mass screening programs in health, security, and education.

Research shows that presenting Bayesian information as a frequency tree — where you can visually follow the population splitting into subgroups — helps people reason more accurately and quickly than other formats (Binder et al., 2021).

Want to learn more? Read the full explanation of why rare-event screening is so dangerous and how this tool fits into the bigger picture: Rarity Roulette: Making the Math of Mass Screenings Visible

Real-World Examples

Mass screenings for low-prevalence problems appear across many domains:

Stephen E. Fienberg (Chair), Committee to Review the Scientific Evidence on the Polygraph, National Research Council, "The Polygraph and Lie Detection" (2003) - Landmark analysis showing why mass polygraph screening of National Lab scientists would harm security rather than help it
Harding Center for Risk Literacy, "Early Detection of Breast Cancer by Mammography Screening" (updated April 2021) - Clear presentation of trade-offs in population-wide breast cancer screening
Harding Center for Risk Literacy, "Early Detection of Prostate Cancer with PSA Testing" (updated November 2020) - Example of screening trade-offs for Prostate-Specific Antigen testing
Dashiell Young-Saver, "The Misleading Math of Prenatal Tests," New York Times (Feb. 10, 2022) - Analysis of false positives in prenatal genetic screening

Note: The mammography and PSA examples use parameters extrapolated from the Harding Center's long-term screening data to match their published false-alarm rates. The Harding fact boxes provide more comprehensive outcomes (e.g., mortality, overdiagnosis, quality-of-life effects), whereas this tool focuses solely on test-classification accuracy. The purpose is illustrative, not a substitute for full cohort data.

Important Considerations

These outcome estimates assume the stated accuracy rates are correct and precisely known. In practice, several factors may distort these estimates:

Why accuracy may be inflated:

Tests often perform better in controlled lab settings than in real-world field conditions
Accuracy parameters themselves carry uncertainty (see Gelman & Carpenter, 2020)
Researchers and tool developers have financial and professional incentives to report favorable results

Why error rates may be misleading:

False negatives may be underestimated when dedicated adversaries actively game the system (e.g., espionage, fraud)
Detection thresholds for continuous biomarkers are often set arbitrarily
Test performance may vary substantially across demographic subgroups

Hidden costs of false positives:

Secondary screenings may cause unintended harm (e.g., elevated mortality risks from unnecessary radiation/other treatment, chronic pain from unnecessary biopsy or mastectomy, psychological distress, and job loss)
Those who benefit financially from follow-up testing often set screening guidelines
Individuals flagged incorrectly bear costs while having no actual condition

What this tool doesn't capture:

This calculates hypothetical estimates along only one causal pathway — the classification pathway
It does not compute net effects accounting for test classification, strategy, information, and resource reallocation