MultiFC

Keywords: multifc, evaluation

MultiFC is the large-scale, multi-domain fact-checking dataset aggregated from 26 professional fact-checking websites — providing the most diverse collection of real-world misinformation labels in NLP, spanning politics, health, science, and urban legends from sources like PolitiFact, Snopes, and FactCheck.org.

What Is MultiFC?

- Scale: ~36,000 claims scraped from 26 distinct fact-checking platforms.
- Sources: Snopes, PolitiFact, FactCheck.org, AFP Fact Check, Full Fact, Vishvas News, Africa Check, and 19 more.
- Labels: Not binary True/False — each site uses its own label system: "Pants on Fire," "Mostly False," "True," "Half True" (PolitiFact); "False," "Misleading," "Mostly False" (Snopes). Over 100 distinct labels across sources.
- Metadata: Each claim includes speaker, date, article URL, tags, and the full verdict article — rich context beyond just the claim text.
- Multimodal Signals: Claim context includes speaker credibility scores, topic tags, and publication metadata.

The Label Normalization Challenge

The core technical difficulty of MultiFC is that different fact-checking sites use incompatible label vocabularies. A "Misleading" label on Reuters Fact Check is not equivalent to "Misleading" on Snopes — the standards and definitions differ. Models must either:

- Coarse-grain: Map all labels to a 3-class (True/Mixed/False) or 2-class (True/False) taxonomy, losing nuance.
- Site-specific training: Train per-site classifiers that respect each site's internal label definitions.
- Zero-shot transfer: Train on some sites, generalize to unseen sites — testing cross-domain transferability.

Why MultiFC Matters

- Real-world Claims: Unlike FEVER (artificial mutations) or SemEval fact-check tasks (small-scale), MultiFC contains the actual lies and misleading claims that circulate on the internet.
- Domain Breadth: Claims span health misinformation ("vaccines cause autism"), political lying ("crime rates are the highest ever"), scientific denialism, economic falsehoods, and celebrity gossip.
- Metadata Value: Speaker identity is a strong signal — a politician during an election cycle, a conspiracy theorist's blog, or a peer-reviewed journal all carry different prior credibility.
- Label Distribution: Heavy class imbalance (more claims rated False than True in political fact-checking) forces models to handle realistic data distributions.
- Cross-lingual Extension: The dataset includes some non-English sources, opening paths to multilingual misinformation research.

Model Approaches

Text-Only Baselines:
- Fine-tune BERT/RoBERTa on claim text alone.
- Performance: ~55-65% 3-class accuracy — revealing that claims alone are often insufficient.

Metadata-Enhanced Models:
- Add speaker embeddings, site-specific label embeddings, publication date features.
- Improvement: +5-10% accuracy from metadata.

Evidence-Retrieval Models:
- Use the full fact-check article as evidence (cheating on real deployment scenarios).
- Upper bound performance: ~80%+ accuracy.

Comparison to Related Benchmarks

| Feature | FEVER | Climate-FEVER | MultiFC |
|---------|-------|---------------|---------|
| Claims | Artificial | Real (climate) | Real (multi-domain) |
| Labels | 3 standard | 4 | 100+ site-specific |
| Evidence | Wikipedia | Wikipedia | Full fact-check articles |
| Metadata | None | None | Speaker, date, tags |
| Scale | 185k | 1.5k | 36k |

Common Failure Modes

- Label Normalization Errors: A model trained on PolitiFact's "Mostly False" misapplies this label on Snopes when they use it differently.
- Domain Shift: Political fact-checking patterns do not transfer to health misinformation patterns.
- Memorization: Models can memorize speaker → label correlations without understanding the claim content.

Applications

- Social Media Moderation: Scale professional fact-checking by pre-screening viral claims.
- Journalist Tools: Assist reporters by surfacing prior fact-checks of similar claims.
- Platform Policy: Automated label assignment for content warning systems.

MultiFC is the professional fact-checker's dataset — training AI on tens of thousands of real expert verdicts to recognize the patterns, contexts, and metadata signals that distinguish reliable information from coordinated misinformation.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT