Superglue | ChipFoundryServices

Home› Knowledge Base› Superglue

SuperGLUE is a more challenging benchmark for natural language understanding that succeeded GLUE after models surpassed human-level performance on the original benchmark, featuring harder tasks requiring more sophisticated reasoning, world knowledge, and nuanced language understanding. Introduced by Wang et al. in 2019, SuperGLUE was designed with higher human baselines and more difficult task formulations to provide a more discriminating evaluation of language model capabilities. SuperGLUE includes eight tasks: BoolQ (Boolean Questions — yes/no questions about short passages requiring inferential reasoning), CB (CommitmentBank — three-class textual entailment on naturally occurring discourse), COPA (Choice of Plausible Alternatives — causal reasoning by selecting the more plausible cause or effect), MultiRC (Multi-Sentence Reading Comprehension — questions requiring reasoning over multiple sentences), ReCoRD (Reading Comprehension with Commonsense Reasoning — cloze-style questions requiring commonsense knowledge), RTE (Recognizing Textual Entailment — same as GLUE but with more training data), WiC (Words in Context — determining if a polysemous word is used with the same sense in two sentences), and WSC (Winograd Schema Challenge — pronoun coreference resolution requiring world knowledge). SuperGLUE scores are averaged across tasks, with human performance at approximately 89.8. Key differences from GLUE include: tasks selected to be above BERT's capability level at the time, more diverse reasoning requirements (causal, commonsense, multi-hop), smaller training sets for some tasks (testing few-shot and transfer capabilities), and more carefully constructed evaluation sets with higher inter-annotator agreement. SuperGLUE drove continued progress in language models: T5 and DeBERTa eventually surpassed human performance by 2021, demonstrating that even this harder benchmark could be addressed through scale and improved pre-training techniques. SuperGLUE established that benchmarks have finite useful lifetimes and must evolve with model capabilities.

superglueevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All