Home Knowledge Base Entity Prediction

Entity Prediction is the pre-training or auxiliary training task where the model must identify, classify, or link named entities in text — explicitly supervising entity-level understanding beyond the general masked language modeling objective, producing representations that encode the identity and type of real-world objects named in text rather than just distributional word co-occurrence statistics.

What Constitutes a Named Entity

Named entities are real-world objects with consistent proper names that can be referenced across documents:

Standard language model pre-training treats these entities identically to common words — the token "Obama" receives the same training signal as "quickly" or "the." Entity prediction tasks force the model to develop specialized representations for real-world referents with consistent global identities.

Task Formulations

Named Entity Recognition (NER) as Pre-training Objective: At each position, predict the entity type label (B-PER, I-PER, B-ORG, I-ORG, O using BIO tagging) in addition to or instead of the masked token. Trains the model to identify entity spans and types without explicit supervision on downstream NER tasks, enabling strong zero-shot NER transfer.

Entity Typing: Given an identified entity mention span, predict its fine-grained type from a large type ontology. Ultra-Fine Entity Typing (UFET) uses thousands of types derived from Wikidata relations (e.g., /person/politician/president, /organization/company/tech_company, /location/city/capital). Fine-grained typing requires integrating context and world knowledge.

Entity Linking / Disambiguation: Given the text "Apple released a new product," link "Apple" to either the company (Wikidata Q312) or the fruit (Q89) based on context. Entity linking requires simultaneously understanding the linguistic context and the knowledge graph structure of candidate entities. The model must disambiguate between thousands of candidate entities sharing the same surface form.

Entity Slot Filling (LAMA Probing): Given a template "Barack Obama was born in [MASK]," predict the entity that fills the slot. Tests factual recall encoded in model parameters — knowledge acquired during pre-training rather than provided in context. The LAMA benchmark uses such templates to assess how much structured world knowledge language models implicitly store.

LUKE — The Entity-Centric Architecture

LUKE (Language Understanding with Knowledge-based Embeddings, 2020) provides the canonical implementation of entity prediction as pre-training:

LUKE achieved state-of-the-art on entity-centric tasks including NER, relation extraction, entity typing, entity linking, and reading comprehension at time of publication, demonstrating that explicit entity supervision substantially improves entity-centric downstream performance.

ERNIE (Tsinghua) — Knowledge Graph Integration

ERNIE from Tsinghua University (distinct from Baidu's ERNIE) integrates entity knowledge through a knowledge fusion architecture:

Benefits Across Downstream Tasks

TaskHow Entity Prediction Helps
Named Entity RecognitionModel already encodes entity spans and type categories
Relation ExtractionEntity embeddings encode relational context from KG
Entity LinkingPre-trained disambiguation reduces fine-tuning data needs
Open-Domain QAFactual entities are directly recalled from parameters
Coreference ResolutionEntity identity is explicitly represented across mentions
Slot FillingTemplate-based entity recall is strengthened
Information ExtractionStructured fact extraction benefits from entity awareness

Complementarity with MLM

MLM and entity prediction are complementary objectives. MLM teaches syntactic structure, function word usage, and local distributional semantics. Entity prediction teaches that specific spans refer to real-world objects with consistent identities across documents and across time. Together, they produce models that understand both language structure and world knowledge — the combination essential for knowledge-intensive NLP tasks where factual accuracy matters.

Entity Prediction is teaching the model who's who — explicitly supervising the model to identify, classify, and link the real-world objects named in text, building the factual knowledge base that pure distributional learning from token co-occurrence statistics cannot provide.

entity predictionnlp

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.