Metapath2vec is a graph embedding algorithm specifically designed for heterogeneous information networks (HINs) — graphs with multiple types of nodes and edges — that constrains random walks to follow predefined meta-paths (semantic schemas specifying the sequence of node types to traverse), ensuring that the learned embeddings capture meaningful domain-specific relationships rather than random structural proximity.
What Is Metapath2vec?
- Definition: Metapath2vec (Dong et al., 2017) extends the DeepWalk/Node2Vec paradigm to heterogeneous graphs by replacing uniform random walks with meta-path-guided walks. A meta-path is a sequence of node types that defines a valid relational path — for example, in an academic network, "Author → Paper → Venue → Paper → Author" (APVPA) defines co-authors who publish in the same venue. The random walker must follow this type sequence, ensuring that the walk captures the specified semantic relationship.
- Meta-Path Schema: The meta-path $mathcal{P} = (A_1 o A_2 o ... o A_l)$ specifies the required sequence of node types. At each step, the walker can only move to a neighbor of the prescribed type. For APVPA, starting from Author A, the walker must go to a Paper, then a Venue, then another Paper, then another Author — capturing the "co-venue authorship" relationship. Different meta-paths encode different semantic relationships.
- Metapath2vec++: The enhanced version uses a heterogeneous skip-gram that conditions the context prediction on the node type — predicting "which Author appears in this context?" separately from "which Paper appears?" — preventing embeddings from being confused by type-mixing in the training objective.
Why Metapath2vec Matters
- Semantic Specificity: In heterogeneous graphs, not all connections are equally meaningful. In a biomedical network with genes, diseases, drugs, and proteins, the path "Gene → Protein → Disease" captures a completely different relationship than "Gene → Gene → Gene." Meta-paths enable domain experts to specify which relationships the embedding should capture, producing task-relevant representations rather than generic structural proximity.
- Heterogeneous Graph Learning: Standard graph embedding methods (DeepWalk, Node2Vec, LINE) treat all nodes and edges as homogeneous, ignoring the rich type information in heterogeneous networks. An academic network where "Author → Paper" edges and "Paper → Venue" edges are treated identically produces embeddings that mix incomparable relationships. Metapath2vec preserves type semantics by constraining walks to meaningful type sequences.
- Knowledge Graph Embeddings: Knowledge graphs (Freebase, YAGO, Wikidata) are inherently heterogeneous — entities have types (Person, Organization, Location) and relations have types (born_in, works_at, located_in). Meta-path-guided walks enable embeddings that capture specific relational patterns rather than generic graph proximity.
- Recommendation Systems: In e-commerce graphs with users, products, brands, and categories, different meta-paths capture different recommendation signals — "User → Product → Brand → Product" for brand loyalty, "User → Product → Category → Product" for category exploration. Metapath2vec enables embedding-based recommendation that follows specific user behavior patterns.
Meta-Path Examples
| Domain | Meta-Path | Semantic Meaning |
|--------|-----------|-----------------|
| Academic | Author → Paper → Author | Co-authorship |
| Academic | Author → Paper → Venue → Paper → Author | Co-venue collaboration |
| Biomedical | Drug → Gene → Disease | Drug-gene-disease pathway |
| E-commerce | User → Product → Brand → Product → User | Brand-based user similarity |
| Social | User → Post → Hashtag → Post → User | Topic-based user similarity |
Metapath2vec is semantic walking — constraining random exploration to follow domain-expert-designed relational trails through heterogeneous networks, ensuring that learned embeddings capture the specific meaningful relationships rather than treating all graph connections as interchangeable.