pagerank-mr

Mapreduce Implementation of the pagerank algorithm

Usage: Generate some data with DataGenerator. You can set the size of the file that will be generated by playing with the constants.

If I get the request, I could always make it read the values from a conf file or from args. As this is not the point of the demo code, I just left it in the code.

When data is generated, just run PagerankMRDriver, making sure that the value for INPUT_SIZE is the same as the size of the input previously generated. Doing another MR job just to figure out N seems like overkill here (and it's Java MR, not pig).

The current default values should allow for running it and it will just work.

The generator is also cluster aware and could very well be set to generate a very very large file.

I added a utility ShowData to display to console the content of a sequence file. Obviously, for the case where the sequence file would be very large, it might take a while to view.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data/vocab		data/vocab
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pagerank-mr

About

Releases

Packages

Languages

License

dumoulma/pagerank-mr

Folders and files

Latest commit

History

Repository files navigation

pagerank-mr

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages