Skip to content

dumoulma/pagerank-mr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pagerank-mr

Mapreduce Implementation of the pagerank algorithm

Usage: Generate some data with DataGenerator. You can set the size of the file that will be generated by playing with the constants.

If I get the request, I could always make it read the values from a conf file or from args. As this is not the point of the demo code, I just left it in the code.

When data is generated, just run PagerankMRDriver, making sure that the value for INPUT_SIZE is the same as the size of the input previously generated. Doing another MR job just to figure out N seems like overkill here (and it's Java MR, not pig).

The current default values should allow for running it and it will just work.

The generator is also cluster aware and could very well be set to generate a very very large file.

I added a utility ShowData to display to console the content of a sequence file. Obviously, for the case where the sequence file would be very large, it might take a while to view.

About

Mapreduce Implementation of the pagerank algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages