MultiFC is the large-scale, multi-domain fact-checking dataset aggregated from 26 professional fact-checking websites — providing the most diverse collection of real-world misinformation labels in NLP, spanning politics, health, science, and urban legends from sources like PolitiFact, Snopes, and FactCheck.org.
What Is MultiFC?
- Scale: ~36,000 claims scraped from 26 distinct fact-checking platforms.
- Sources: Snopes, PolitiFact, FactCheck.org, AFP Fact Check, Full Fact, Vishvas News, Africa Check, and 19 more.
- Labels: Not binary True/False — each site uses its own label system: "Pants on Fire," "Mostly False," "True," "Half True" (PolitiFact); "False," "Misleading," "Mostly False" (Snopes). Over 100 distinct labels across sources.
- Metadata: Each claim includes speaker, date, article URL, tags, and the full verdict article — rich context beyond just the claim text.
- Multimodal Signals: Claim context includes speaker credibility scores, topic tags, and publication metadata.
The Label Normalization Challenge
The core technical difficulty of MultiFC is that different fact-checking sites use incompatible label vocabularies. A "Misleading" label on Reuters Fact Check is not equivalent to "Misleading" on Snopes — the standards and definitions differ. Models must either:
- Coarse-grain: Map all labels to a 3-class (True/Mixed/False) or 2-class (True/False) taxonomy, losing nuance.
- Site-specific training: Train per-site classifiers that respect each site's internal label definitions.
- Zero-shot transfer: Train on some sites, generalize to unseen sites — testing cross-domain transferability.
Why MultiFC Matters
- Real-world Claims: Unlike FEVER (artificial mutations) or SemEval fact-check tasks (small-scale), MultiFC contains the actual lies and misleading claims that circulate on the internet.
- Domain Breadth: Claims span health misinformation ("vaccines cause autism"), political lying ("crime rates are the highest ever"), scientific denialism, economic falsehoods, and celebrity gossip.
- Metadata Value: Speaker identity is a strong signal — a politician during an election cycle, a conspiracy theorist's blog, or a peer-reviewed journal all carry different prior credibility.
- Label Distribution: Heavy class imbalance (more claims rated False than True in political fact-checking) forces models to handle realistic data distributions.
- Cross-lingual Extension: The dataset includes some non-English sources, opening paths to multilingual misinformation research.
Model Approaches
Text-Only Baselines:
- Fine-tune BERT/RoBERTa on claim text alone.
- Performance: ~55-65% 3-class accuracy — revealing that claims alone are often insufficient.
Metadata-Enhanced Models:
- Add speaker embeddings, site-specific label embeddings, publication date features.
- Improvement: +5-10% accuracy from metadata.
Evidence-Retrieval Models:
- Use the full fact-check article as evidence (cheating on real deployment scenarios).
- Upper bound performance: ~80%+ accuracy.
Comparison to Related Benchmarks
| Feature | FEVER | Climate-FEVER | MultiFC |
|---------|-------|---------------|---------|
| Claims | Artificial | Real (climate) | Real (multi-domain) |
| Labels | 3 standard | 4 | 100+ site-specific |
| Evidence | Wikipedia | Wikipedia | Full fact-check articles |
| Metadata | None | None | Speaker, date, tags |
| Scale | 185k | 1.5k | 36k |
Common Failure Modes
- Label Normalization Errors: A model trained on PolitiFact's "Mostly False" misapplies this label on Snopes when they use it differently.
- Domain Shift: Political fact-checking patterns do not transfer to health misinformation patterns.
- Memorization: Models can memorize speaker → label correlations without understanding the claim content.
Applications
- Social Media Moderation: Scale professional fact-checking by pre-screening viral claims.
- Journalist Tools: Assist reporters by surfacing prior fact-checks of similar claims.
- Platform Policy: Automated label assignment for content warning systems.
MultiFC is the professional fact-checker's dataset — training AI on tens of thousands of real expert verdicts to recognize the patterns, contexts, and metadata signals that distinguish reliable information from coordinated misinformation.