knnMeetsConnectedComponents

Publications

Alessandro Lulli, Thibault Debatty, Laura Ricci, Matteo Dell’Amico, and Pietro Michiardi, Scalable k-NN based text clustering, Accepted ad IEEE BigData 2015

How to build

The project can be built using Maven. From the main dir: mvn package

How to run

The main class is: util.KnnMeetsConnectedComponents

It is possible to execute the job in two ways:

Submit a job to your Spark environment
use the script in run/runKnnMeetsConnectedComponents.sh

It is required also to provide under the lib folder the Spark lib. A pre-built Spark lib can be downloaded from the following URL: https://www.dropbox.com/s/xnfqs0ht4nqv5lc/spark-assembly-1.2.0-hadoop2.2.0.jar?dl=0

Configuration

The application requires a configuration file. An example of configuration file is: run/config_knnMeetsCC

Dataset format

The application requires the following format:

vertexIdentifierseparatorstringValue

Where separator can be configured using the edgelistSeparator configuration variable An example is: run/subjectSmall

Contact

In case of any issues / suggestions or to have further details please contact: lulli@di.unipi.it http://www.di.unipi.it/~lulli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

knnMeetsConnectedComponents

Publications

How to build

How to run

Configuration

Dataset format

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

knnMeetsConnectedComponents

Publications

How to build

How to run

Configuration

Dataset format

Contact