...
- Very simple word count example which works TAR BALL
- Compute PageRank TAR BALL for medium-set wiki-pages from the Harvard class by Hanspeter Pfister, code still have problems in deploying it. It crashes on reduce if full 11M lines is processed on 10-node EC2 cluster
- here is tar-ball of the finale version of my code
- to upload it more often I used command
scp balewski@deltag5.lns.mit.edu:"0x/mySetup.sh" .
./mySetup.sh -f l -v11 -D
it containsCode Block training 495 2009-11-11 20:25 abcd-pages training 290 2009-11-11 20:25 cleanup.py training 1374 2009-11-14 19:47 mappPR.py training 2302 2009-11-14 18:30 pageRankCommon.py training 2648 2009-11-14 18:31 pageRankCommon.pyc training 1034 2009-11-14 19:25 reduPR.py training 7251 2009-11-14 11:33 runPageRank.sh training 1806 2009-11-14 18:34 wiki2mappPR.py
- to upload data set to hadoop HDFS by hand I did
hadoop fs -put wL-pages-iter0 wL-pages-iter0 - to execute full map/reduce job w/ 3 iterations:
cleaniup all, write raw file, use 4map+2red, init Map, 3 x M/R, final sort
/runPageRank.sh -X -w -D 4.2 -m -I 0.3 -i -f
- to upload it more often I used command
...