Mapreduce Implementation of the pagerank algorithm
Usage: Generate some data with DataGenerator. You can set the size of the file that will be generated by playing with the constants.
If I get the request, I could always make it read the values from a conf file or from args. As this is not the point of the demo code, I just left it in the code.
When data is generated, just run PagerankMRDriver, making sure that the value for INPUT_SIZE is the same as the size of the input previously generated. Doing another MR job just to figure out N seems like overkill here (and it's Java MR, not pig).
The current default values should allow for running it and it will just work.
The generator is also cluster aware and could very well be set to generate a very very large file.
I added a utility ShowData to display to console the content of a sequence file. Obviously, for the case where the sequence file would be very large, it might take a while to view.