Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Very simple word count example which works TAR BALL
  2. Compute PageRank  TAR BALL for medium-set wiki-pages from the Harvard class by Hanspeter Pfister, code still have problems in deploying it. It crashes on reduce if full 11M lines is processed on 10-node EC2 cluster
  3. here is tar-ball of the finale version of my code
    • to upload it more often I used command
      scp balewski@deltag5.lns.mit.edu:"0x/mySetup.sh" .
      ./mySetup.sh -f l -v11 -D
      it contains
      Code Block
       training  495 2009-11-11 20:25 abcd-pages
       training  290 2009-11-11 20:25 cleanup.py
       training 1374 2009-11-14 19:47 mappPR.py
       training 2302 2009-11-14 18:30 pageRankCommon.py
       training 2648 2009-11-14 18:31 pageRankCommon.pyc
       training 1034 2009-11-14 19:25 reduPR.py
       training 7251 2009-11-14 11:33 runPageRank.sh
       training 1806 2009-11-14 18:34 wiki2mappPR.py
      
    • to upload data set to hadoop HDFS by hand I did
      hadoop fs -put wL-pages-iter0 wL-pages-iter0
    • to execute full map/reduce job w/ 3 iterations:
      cleaniup all, write raw file, use 4map+2red, init Map, 3 x M/R, final sort
      /runPageRank.sh -X -w -D 4.2 -m -I 0.3 -i -f

...