Wiki Markup |
---|
This was 4 hours workshop, I attended remotely. |
...
Instruction for hands-on exercise is here |
...
: [http://tinyurl.com/nerschadoopoct\*|http://tinyurl.com/nerschadoopoct* |
...
] Hadoop admin page is: |
...
[http://maghdp01.nersc.gov:50030/jobtracker.jsp |
...
] *my |
...
Notes* * my shell was not bash, I changed it by typing {color:#0000ff}bash \-l{color}{color:#3366ff} {color}, type {color:#0000ff}echo $SHELL{color} * {color:#0000ff}module load tig hadoop{color} * generic hadoop command: *hadoop command \[genericOptions\] \[commandOptions\]* |
...
* Create hadoop FS |
...
:{color:#0000ff} hadoop fs \-mkdir /user/balewski |
...
{color} * List its content (should be nothing now, but no error) |
...
Exercise 1: create , load, read back text file to HFS
Code Block |
---|
: {color:#0000ff}hadoop fs \-ls{color} Exercise 1: create , load, read back text file to HFS {code} $ vi testfile1 This is file 1 This is to test HDFS $ vi testfile2 This is file 2 This is to test HDFS again $ hadoop fs -mkdir input $ hadoop fs -put testfile* input/ $ hadoop fs -cat input/testfile1 $ hadoop fs -cat input/testfile* $ hadoop fs -get input input $ ls input/ {code} Exercise 2: run hadoop job from the package |
...
{code |
} $ hadoop fs -mkdir wordcount-in $ hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/ $ hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op $ hadoop fs -ls wordcount-op $ hadoop fs -cat wordcount-op/p* | grep Darcy {code} Monitor its progress form URL: [http://maghdp01.nersc.gov:50030/] [http://maghdp01.nersc.gov:50070/ |
...
] To re-run a job you must first CLEANUP old output files: |
...
{color:#0000ff} hadoop dfs \-rmr wordcount-opd |
...
{color} Next run Hadoop on 4 reducers |
...
: {color:#0000ff}hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount{color} {color:#ff0000}\-Dmapred.reduce.tasks=4 |
...
{color} {color:#0000ff} wordcount-in wordcount-op |
...
{color} Some suggestion: change user permision to allow me to read the Hadoop output because Hadopp owns all by default on the Scratch disk |
...
Or use provided script: |
...
*fixperms.sh /global/scratch/sd/balewski/hadoop/wordcount\- gpfs/ |
...
*
* d
* d
* |