Home Knowledge Base Patent Similarity

Patent Similarity is the NLP task of computing semantic similarity between patent documents — enabling prior art search, patent clustering, portfolio analysis, and infringement detection by measuring how closely two patents cover the same technological concept, regardless of differences in claim language, inventor vocabulary, and jurisdiction-specific drafting conventions.

What Is Patent Similarity?

Why Patent Similarity Is Hard

Deliberate Claim Language Variation: Patent attorneys intentionally use different vocabulary for the same concept to achieve claim differentiation or breadth. "A system for processing data" and "an apparatus for information manipulation" may cover identical technology — surface similarity is insufficient.

Hierarchical Claim Structure: Claim 1 (broad, independent) may be similar to another patent's Claim 1 at a high level, but the dependent claims narrow the scope differently. True similarity requires analyzing the claim hierarchy.

Cross-Language Patents: The same invention is often patented in English, German, Japanese, Chinese, and Korean — similarity across languages requires multilingual embeddings.

Technical vs. Legal Similarity: Two patents may use the same technical concept (transformer neural networks) with entirely different claim scope — one covering a specific hardware implementation, another a training algorithm. Technical similarity ≠ legal overlap.

Figures and Formulas: Chemical patents encode core invention in SMILES strings and structural formulas; mechanical patents in technical drawings — full similarity requires multi-modal comparison.

Similarity Computation Approaches

Lexical Overlap (BM25 / TF-IDF):

Bi-Encoder Dense Retrieval (PatentBERT, AugPatentBERT):

Cross-Encoder Reranking:

Claim Decomposition + Matching:

Performance Results (CLEF-IP Prior Art Retrieval)

SystemMAP@10Recall@100
TF-IDF baseline0.310.54
BM250.350.61
PatentBERT bi-encoder0.440.71
Cross-encoder reranking0.520.74
GPT-4 reranker (top-10)0.55

Commercial Patent Similarity Tools

Why Patent Similarity Matters

Patent Similarity is the semantic prior art compass — enabling precise navigation of the 110-million patent corpus to identify the documents that define, overlap, or anticipate any given patented invention, grounding every IP strategy decision in comprehensive knowledge of the existing intellectual property landscape.

patent similaritylegal ai

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.