Home Knowledge Base Entity Extraction and NER

Entity Extraction and NER

What is Named Entity Recognition? NER identifies and classifies named entities in text into predefined categories like person, organization, location, date, etc.

Common Entity Types

EntityExamples
PERSONElon Musk, Marie Curie
ORGGoogle, United Nations
LOCATIONParis, Mount Everest
DATEJanuary 1st, 2024
MONEY$100, 50 million euros
PRODUCTiPhone 15, Model S

Approaches

Traditional NER (spaCy)

import spacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("Apple CEO Tim Cook announced new products in Cupertino.")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")
# Apple: ORG
# Tim Cook: PERSON
# Cupertino: GPE

LLM-Based Extraction

def extract_entities(text: str) -> dict:
    result = llm.generate(f"""
Extract entities from this text in JSON format:
{{
    "persons": [],
    "organizations": [],
    "locations": [],
    "dates": []
}}

Text: {text}
    """)
    return json.loads(result)

Structured Extraction (Instructor)

from pydantic import BaseModel
import instructor

class Entities(BaseModel):
    persons: list[str]
    organizations: list[str]
    locations: list[str]
    products: list[str]

client = instructor.from_openai(OpenAI())
entities = client.chat.completions.create(
    model="gpt-4o",
    response_model=Entities,
    messages=[{"role": "user", "content": f"Extract entities: {text}"}]
)

Domain-Specific NER

Custom Entity Types

# Medical
entities = ["DRUG", "DISEASE", "SYMPTOM", "TREATMENT"]

# Legal
entities = ["CASE", "STATUTE", "COURT", "PARTY"]

# Financial
entities = ["TICKER", "COMPANY", "METRIC", "CURRENCY"]

Fine-Tuning Train on domain-specific data:

# Training data format
[
    ("Aspirin reduces cold symptoms.", {"entities": [(0, 7, "DRUG"), (16, 20, "SYMPTOM")]}),
    ...
]

Use Cases

Use CaseApplication
RAG preprocessingExtract entities for search
Knowledge graphBuild entity-relation triples
Content indexingCategorize documents
Information extractionStructured data from text

Best Practices

entity extractionnerparsing

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.