...
This
...
was
...
4
...
hours
...
workshop,
...
I
...
attended
...
remotely.
...
Instruction
...
for
...
hands-on
...
exercise
...
is
...
here:
...
http://tinyurl.com/nerschadoopoct
...
Hadoop admin page is:
http://maghdp01.nersc.gov:50030/jobtracker.jsp
...
my Notes
- my shell was not bash, I changed it by typing bash -l , type echo $SHELL
- module load tig hadoop
- generic hadoop command: hadoop command [genericOptions] [commandOptions]
- Create hadoop FS: hadoop fs -mkdir /user/balewski
- List its content (should be nothing now, but no error) : hadoop fs -ls
Exercise 1: create , load, read back text file to HFS
Code Block |
---|
Notes* * my shell was not bash, I changed it by typing {color:#0000ff}bash \-l{color}{color:#3366ff} {color}, type {color:#0000ff}echo $SHELL{color} * {color:#0000ff}module load tig hadoop{color} * generic hadoop command: *hadoop command \[genericOptions\] \[commandOptions\]* * Create hadoop FS:{color:#0000ff} hadoop fs \-mkdir /user/balewski{color} * List its content (should be nothing now, but no error) : {color:#0000ff}hadoop fs \-ls{color} Exercise 1: create , load, read back text file to HFS {code} $ vi testfile1 This is file 1 This is to test HDFS $ vi testfile2 This is file 2 This is to test HDFS again $ hadoop fs -mkdir input $ hadoop fs -put testfile* input/ $ hadoop fs -cat input/testfile1 $ hadoop fs -cat input/testfile* $ hadoop fs -get input input $ ls input/ {code} |
Exercise
...
2:
...
run
...
hadoop
...
job
...
from
...
the
...
package
Code Block |
---|
} $ hadoop fs -mkdir wordcount-in $ hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/ $ hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op $ hadoop fs -ls wordcount-op $ hadoop fs -cat wordcount-op/p* | grep Darcy {code} |
Monitor
...
its
...
progress
...
form
...
URL:
...
http://maghdp01.nersc.gov:50030/
...
http://maghdp01.nersc.gov:50070/
...
To
...
re-run
...
a
...
job
...
you
...
must
...
first
...
CLEANUP
...
old
...
output
...
files:
...
hadoop dfs -rmr
...
wordcount-opd
...
Next
...
run
...
Hadoop
...
on
...
4
...
reducers
...
: hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar
...
wordcount
...
-Dmapred.reduce.tasks=4
...
wordcount-in
...
wordcount-op
...
Some
...
suggestion:
...
change
...
user
...
permision
...
to
...
allow
...
me
...
to
...
read
...
the
...
Hadoop
...
output
...
because
...
Hadopp
...
owns
...
all
...
by
...
default
...
on
...
the
...
Scratch
...
disk
...
Or
...
use
...
provided
...
script:
...
fixperms.sh
...
/global/scratch/sd/balewski/hadoop/wordcount
...
-
...
gpfs/
...
- d
- d