Entity Extraction and NER | ChipFoundryServices

Home› Knowledge Base› Entity Extraction and NER

Entity Extraction and NER

What is Named Entity Recognition? NER identifies and classifies named entities in text into predefined categories like person, organization, location, date, etc.

Common Entity Types

Entity	Examples
PERSON	Elon Musk, Marie Curie
ORG	Google, United Nations
LOCATION	Paris, Mount Everest
DATE	January 1st, 2024
MONEY	$100, 50 million euros
PRODUCT	iPhone 15, Model S

Approaches

Traditional NER (spaCy)

import spacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("Apple CEO Tim Cook announced new products in Cupertino.")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")
# Apple: ORG
# Tim Cook: PERSON
# Cupertino: GPE

LLM-Based Extraction

def extract_entities(text: str) -> dict:
    result = llm.generate(f"""
Extract entities from this text in JSON format:
{{
    "persons": [],
    "organizations": [],
    "locations": [],
    "dates": []
}}

Text: {text}
    """)
    return json.loads(result)

Structured Extraction (Instructor)

from pydantic import BaseModel
import instructor

class Entities(BaseModel):
    persons: list[str]
    organizations: list[str]
    locations: list[str]
    products: list[str]

client = instructor.from_openai(OpenAI())
entities = client.chat.completions.create(
    model="gpt-4o",
    response_model=Entities,
    messages=[{"role": "user", "content": f"Extract entities: {text}"}]
)

Domain-Specific NER

Custom Entity Types

# Medical
entities = ["DRUG", "DISEASE", "SYMPTOM", "TREATMENT"]

# Legal
entities = ["CASE", "STATUTE", "COURT", "PARTY"]

# Financial
entities = ["TICKER", "COMPANY", "METRIC", "CURRENCY"]

Fine-Tuning Train on domain-specific data:

# Training data format
[
    ("Aspirin reduces cold symptoms.", {"entities": [(0, 7, "DRUG"), (16, 20, "SYMPTOM")]}),
    ...
]

Use Cases

Use Case	Application
RAG preprocessing	Extract entities for search
Knowledge graph	Build entity-relation triples
Content indexing	Categorize documents
Information extraction	Structured data from text

Best Practices

Use traditional NER for speed on common entities
Use LLM for complex or domain-specific extraction
Validate and normalize extracted entities
Handle entity linking (resolve "Apple" to specific company)

entity extractionnerparsing

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.

🔍 Search Topics 💬 Ask CFSGPT 📚 Browse All