Skip to content

metapath randomwalk

kent edited this page Nov 28, 2019 · 2 revisions

Introduction

This algorithm is a random walk based on meta-path, which is often used in heterogeneous network representation learning tasks. Meta-path is a path that defines two types of objects on the Network schema. It describes the semantic relationship between objects, and the mining of this semantic relationship is the cornerstone of subsequent tasks. By designing metapaths with the same type of head and tail, then repeating the walks based on the same metapath continuously to generate sequences of nodes, you can follow Skip-Gram training to obtain the vector representation of the nodes. The definition of random walk based on meta-path and the method used to represent learning tasks can refer to the paper Metapath2vec.

Parameters

use --help param to view detailed help information.

Input Format

The input consists of two pieces of data: edge data and node type data.

Input files of edge data should be formatted as follows:

Each line of the input file requires the following format: <src>,<dst> or <src>,<dst>,<weight>, which represents the head and tail of an edge, <src> and <dst> is the id number of uint32_tweight of type float is the weight of the edge.

Input example of edge data (Following numbers are synthetic and are for demonstration purpose only.):

123,856
856,123

Input files of node type data should be formatted as follows:

Each line of the input file requires the following format: <id>,<type>, where <id> and <type> are numbers of type uint32_t`, which represent the node id and node type .

Input example of node type data (Following numbers are synthetic and are for demonstration purpose only.):

123,0
246,1
856,1
666,2

Output Format

Output files are formatted as follows:

For each node, the output is a multi-line text file, each line is a sequence of nodes separated by spaces. The format of each line: <nod1> <nod2> <nod3> <nod4> …

Output example (Following numbers are synthetic and are for demonstration purpose only.):

1 2 3 4
1 3 8 2

Code

https://github.com/Tencent/plato/blob/master/example/metapath_randomwalk.cc

Algorithms to open source:

  • Network Embedding
    • LINE
    • Word2Vec
    • GraphVite
  • GNN
    • GCN
    • GraphSage
Clone this wiki locally