z3d's blog: Hadoop WordCount tutorial

Thursday, April 16, 2020

Hadoop WordCount tutorial

In this post we will write a word count program for Hadoop which is the equivalent of a Hello World program for any other language. This tutorial assumes that you have Hadoop already setup and is running. Begin by copying the code below and saving the file as WordCount.java
Note the name is important. It must be exactly equal to the class name. Before running the code set the environment variables as follows

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

Compile WordCount.java and create a jar:

$ hadoop com.sun.tools.javac.Main WordCount.java
jar cf wc.jar WordCount*.class

Create two files for input to the Map-Reduce. We will make two files in the input folder as follows

mkdir input
echo "Hello World Bye World" > input/file01
echo "Hello Hadoop Goodbye Hadoop" > input/file02

We also need to create the input folder on HDFS

hadoop fs -mkdir -p /user/$USER/input

Now we need to move these files into HDFS

hadoop fs -copyFromLocal input/ /user/$USER/

Verify that the files have been copied

hadoop fs -ls /user/zaid/input

It should show two files as follows

Found 2 items
-rw-r--r--   1 zaid supergroup         22 2020-04-17 09:55 /user/zaid/input/file01
-rw-r--r--   1 zaid supergroup         28 2020-04-17 09:55 /user/zaid/input/file02

Now lets run the application

hadoop jar wc.jar WordCount /user/$USER/input /user/$USER/output

The program should run and show a lot of output and hopefully no errors. Once complete you can check the output as follows

$ hadoop fs -cat /user/$USER/output/part-r-00000

z3d's blog

Pages

Thursday, April 16, 2020

Hadoop WordCount tutorial

No comments:

Post a Comment

Blog Archive