This is the codebase for reproducing results of TKDE paper: "Weakly Supervised Concept Map Generation through Task-Guided Graph Translation". (arXiv link)
python==3.7.9
For library requirements, please refer to ./requirements.txt
. (You may replace PyTorch and dgl to CPU version)
Pre-processed Graphs
The NLP pipeline derived initial concept maps are available at NYT link, AMiner link, YELP link.
Put it under the project root directory and decompress it. Then three *.pickle.gz
files would reside under ./data/
. (No need to decompress *.pickle.gz files)
The expected ./data
folder after downloading necessary resources:
./data
|-- dblp.txt # dblp refer to AMiner corpus used in paper
|-- dblp.win5.pickle.gz
|-- nyt.txt
|-- nyt.win5.pickle.gz
|-- yelp.txt
|-- yelp.sentiment_centric.win5.pickle.gz
Pre-trained Word Embeddings
GT-D2G relies on several pre-trained word embeddings. By default, the scripts read pre-trained embeddings from ./.vector_cache
folder.
- GloVe for NYT, AMiner: Download
glove.840B.300d
from https://nlp.stanford.edu/projects/glove/. - Customed emb for Yelp: For yelp dataset, we get the best performan using a hybrid of GloVe and restaurant embedding, which can be download from link.
The expected ./.vector_cache
folder:
./.vector_cache
|--glove.840B.300d.txt
|--glove.840B.restaurant.400d.vec
Checkpoints
GT-D2G-path
: https://figshare.com/articles/dataset/GT-D2G_Data/16415802?file=30419121GT-D2G-neigh
: https://figshare.com/articles/dataset/GT-D2G_Data/16415802?file=30419181GT-D2G-var
: https://figshare.com/articles/dataset/GT-D2G_Data/16415802?file=30419157
Please download gziped checkpoint files using the above urls, and decompress them under ./checkpoints
folder.
Example of running GT-D2G for reproducibility: sh run_test.sh
.
You can train your own GT-D2G by modifying provided examples run_train.sh
.