...
- Python Mapper
reducer : read a stream of "<word> 1" from stdinCode Block title mapp1.py borderStyle solid #!/usr/bin/env python # my 1st mapper: writes <word> 1 import sys data = sys.stdin.readlines() for ln in data: L=ln.split() for key in L:scp balewski@deltag5.lns.mit.edu:"0x/mySetup.sh" . if len(key)>1: print key,1
write "<word> <count>" to stdoutCode Block title redu1.py borderStyle solid #!/usr/bin/env python # my 1st reducer: reads: <word> <vlue., sums key values from the same consecutive key, writes <word> <sum> import sys data = sys.stdin.readlines() myKey="" myVal=0 for ln in data:scp balewski@deltag5.lns.mit.edu:"0x/mySetup.sh" . #print ln, L=ln.split() #print L nw=len(L)/2 for i in range(nw): #print i key=L[0+2*i] val=int(L[2*i+1]) #print key,val,nw if myKey==key: myVal=myVal+val else: if len(myKey)>0: print myKey, myVal, myKey=key myVal=val if len(myKey)>0: print myKey, myVal,
- Execute:
Code Block hadoop jar $SJAR \ -mapper $(pwd)/mapp1.py \ -reducer $(pwd)/redu1.py \ -input inputShak \ -output outputShak3
- Python Mapper
...
- compute PageRank for medium-set wiki-pages from the Harvard class by Hanspeter Pfister, code still have problems in deploying on EC2 cluster
- here is tar-ball of the finale version of my code
- to upload it more often I used command
*scp balewski@deltag5.lns.mit.edu:"0x/mySetup.sh" . *
./mySetup.sh -f l -v11 -D
it containsCode Block training 495 2009-11-11 20:25 abcd-pages training 290 2009-11-11 20:25 cleanup.py training 1374 2009-11-14 19:47 mappPR.py training 2302 2009-11-14 18:30 pageRankCommon.py training 2648 2009-11-14 18:31 pageRankCommon.pyc training 1034 2009-11-14 19:25 reduPR.py training 7251 2009-11-14 11:33 runPageRank.sh training 1806 2009-11-14 18:34 wiki2mappPR.py
- to upload data set to hadoop HDFS by hand I did
hadoop fs -put wL-pages-iter0 wL-pages-iter0 - to execute full map/reduce job w/ 3 iterations:
cleaniup all, write raw file, use 4map+2red, init Map, 3 x M/R, final sort
/runPageRank.sh -X -w -D 4.2 -m -I 0.3 -i -f
- to upload it more often I used command
...