This was 4 hours workshop, I attended remotely.
Instruction for hands-on exercise is here:
http://tinyurl.com/nerschadoopoct*
Hadoop admin page is:
http://maghdp01.nersc.gov:50030/jobtracker.jsp
my Notes
- my shell was not bash, I changed it by typing bash -l , type echo $SHELL
- module load tig hadoop
generic hadoop command: hadoop command [genericOptions] [commandOptions]
- Create hadoop FS: hadoop fs -mkdir /user/balewski
- List its content (should be nothing now, but no error) : hadoop fs -ls
Exercise 1: create , load, read back text file to HFS
$ vi testfile1 This is file 1 This is to test HDFS $ vi testfile2 This is file 2 This is to test HDFS again $ hadoop fs -mkdir input $ hadoop fs -put testfile* input/ $ hadoop fs -cat input/testfile1 $ hadoop fs -cat input/testfile* $ hadoop fs -get input input $ ls input/
Exercise 2: run hadoop job from the package
$ hadoop fs -mkdir wordcount-in $ hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/ $ hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op $ hadoop fs -ls wordcount-op $ hadoop fs -cat wordcount-op/p* | grep Darcy
Monitor its progress form URL: http://maghdp01.nersc.gov:50030/ http://maghdp01.nersc.gov:50070/
To re-run a job you must first CLEANUP old output files: hadoop dfs -rmr wordcount-opd
Next run Hadoop on 4 reducers : hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount -Dmapred.reduce.tasks=4 wordcount-in wordcount-op
Some suggestion: change user permision to allow me to read the Hadoop output because Hadopp owns all by default on the Scratch disk
Or use provided script: fixperms.sh /global/scratch/sd/balewski/hadoop/wordcount- gpfs/
- d
- d