DBLP (Google Drive): 601.4MB
SLAP (Google Drive): 295.8MB
ACM (Google Drive): 752.1MB
IMDB (Google Drive): 94.3MB
Dataset | # Nodes | Node types | Meta-paths | # Meta-path instances | # Labels | # Features |
---|---|---|---|---|---|---|
DBLP | 14475(A) | Author(A) Paper(P) Conference(C) |
APA APCPA |
40269 19445349 |
4 | 5000+ |
SLAP | 20419(G) | Gene(G) Gene Ontology(O) Pathway(P) Compound(C) Tissue(T) Gene Family(F) Disease(D) |
GTG GFG GDG GPG GOG GG GDCDG |
303487 582741 7494 416462 3185779 172248 18095 |
15 | 2695 |
ACM | 12499(P) | Paper(P) Author(A) Proceeding(O) Institute(I) Conference(C) |
PAP PAIAP POP POCOP PP |
91662 13303015 700386 7849967 30621 |
11 | 8000 |
IMDB* | 18352(M) | Movie(M) Actor(A) Actress(E) Director(D) |
MAM? MDM? MEM? |
63659? 1085810? 565443? |
9 | 1000 |
- * Multiple label dataset.
- ? Not sure which meta-path is corresponding to which number of meta-path instances.
- + Use
nltk.corpus.stopwords
and extract the bag-of-word representation. - For
DBLP
,SLAP
andACM
, please refer to the paper Meta Path-Based Collective Classification in Heterogeneous Information Networks. - For
IMDB
, please refer to the paper Column Networks for Collective Classification.