Home Knowledge Base gSCAN (grounded SCAN)

gSCAN (grounded SCAN) is the benchmark for systematically testing compositional generalization in visually grounded instruction following — placing an agent in a grid world where it must execute commands like "walk to the small red circle," with test splits specifically designed so that novel concept combinations (e.g., "yellow circle" when yellow objects and circles were trained separately) expose whether the model truly understands each concept independently or merely memorizes training pairs.

What Is gSCAN?

The 6 Generalization Splits

Split A — Random: Standard train/test split. Establishes the baseline performance ceiling.

Split B — Yellow Circles: Yellow objects and circles appear separately in training. Test requires "yellow circle" instructions — testing attribute composition.

Split C — Red Squares: Similar to B but with a different combination.

Split D — Novel Direction: The agent always starts facing south in training. Test has the agent facing north, east, or west — tests direction invariance.

Split E — Relative Clause: Commands with relative clauses ("push the circle to the right of the square") are held out from training.

Split F — Class Label Consistency: Objects of a specific class appear consistently on one side of the grid in training. Tests whether models exploit positional shortcuts rather than object identity.

gSCAN Results Across Models

ModelSplit ASplit B (yellow circle)Split D
Seq2Seq + attention~98%~15%~15%
Compositional Model~98%~83%~91%
GPT-4 (zero-shot)~75%~52%~63%

The catastrophic failure on Split B (yellow circle) — a combination trivially understood by humans — is gSCAN's central finding.

Why gSCAN Matters

Comparison to SCAN and COGS

BenchmarkGroundedVisionInstruction TypeSize
SCANNoNoAction sequences20k
gSCANYesGrid worldNavigation + manipulation867k
COGSNoNoSemantic parsing (logical forms)24k

gSCAN is the unobserved combination test for embodied AI — measuring whether an agent that has learned "yellow objects" and "circles" separately can immediately understand instructions involving "yellow circles," directly probing the compositional generalization gap that separates human-like concept formation from statistical pattern matching in grounded neural agents.

gscanevaluation

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.