Monday, February 16, 2015

MapReduce Program using Eclipse and Hadoop 2.6.0


- Single node Hadoop should be up and runing

Please follow my previous blog on single node Hadoop setup if it not ready for you

Hadoop Single Node Setup

- Down load and install Eclipse from below location if Eclipse does not exist


bhupendra@ubuntu:/home/hduser/eclipse$ ./eclipse
Step 1:
Start Eclipse and create New Java  Project as below

Step 2: Write Wordcount Driver class

Step 3: Replace below code from autogenerated class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool{
      public int run(String[] args) throws Exception
            //creating a JobConf object and assigning a job name for identification purposes
            JobConf conf = new JobConf(getConf(), WordCount.class);

            //Setting configuration object with the Data Type of output Key and Value

            //Providing the mapper and reducer class names
            //We wil give 2 arguments at the run time, one in input path and other is output path
            Path inp = new Path(args[0]);
            Path out = new Path(args[1]);
            //the hdfs input and output directory to be fetched from the command line
            FileInputFormat.addInputPath(conf, inp);
            FileOutputFormat.setOutputPath(conf, out);

            return 0;
      public static void main(String[] args) throws Exception
            // this main function will call run method defined above.
        int res = Configuration(), new WordCount(),args);

Step 4: Create new WordCountMapper class

Replace with below codes

import java.util.StringTokenizer;

import org.apache.hadoop.mapred.*;

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
      //hadoop supported data types
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
      //map method that performs the tokenizer job and framing the initial key value pairs
      // after all lines are converted into key-value pairs, reducer is called.
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
            //taking one line at a time from input file and tokenizing the same
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
          //iterating through all the words available in that line and forming the key value pair
            while (tokenizer.hasMoreTokens())
               //sending to output collector which inturn passes the same to reducer
                 output.collect(word, one);

Step 5: Create WordCountReducer Class
And replace with below code

import java.util.Iterator;

import org.apache.hadoop.mapred.*;

public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
      //reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final out put
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
            int sum = 0;
            /*iterates through all the values available with a key and add them together and give the
            final result as the key and sum of its values*/
          while (values.hasNext())
               sum +=;
          output.collect(key, new IntWritable(sum));

Step 6: If Above java classes are not having any syntax error, the corresponding class file will be generated automatically as follows:

Step 7:
Additionally before step 6,  we have to add dependencies by  adding external libraries from hadoop
Follow the below screenshots and added external jar from path
( in my case its /usr/local/hadoop-2.6.0/share/hadoop/common and /usr/local/hadoop-2.6.0/share/hadoop/mapreduce )


Step 8:
Now Click on the Run tab and click Run-Configurations. Click on New Configuration button and fill the Name, Project Name and Main Class per screen-shots

Step 9:
Now right click on project and  select Export. under Java, select Runnable Jar.
In Launch Config - select the config fie you created in Step 8  (WordCountConfig).
Select an export destination ( lets say desktop.)
Under Library handling, select Extract Required Libraries into generated JAR and click Finish.

Step 10:

- Switch to hduser $sudo su hduser

- Remove temp file generated to gracefully start all required hadoop deamons like namenode, datanode, resourcemange, applicaiton manager, 2ndory Name node.

temp file location is based on tmp file location defined in one of the hadoop configuration file core-site.xml

Step 11: Format name node using below command

#hadoop namenode -format and output will be something like below

Step 12: start process and check if required deamon has been started gracefully or not. Refer below screen and commands for the same

Please note, if temp files are there and not removed, few of deamons will not be started properly. 

Step 13:
Make a hdfs directory ( Note: These directories are not listed when ls is used in the terminal and they are also not visible in the File Browser ) -  hadoop dfs -mkdir -p /usr/local/hadoop-2.6.0/input
Copy the sample input text file into this hdfs directory -   hadoop dfs -copyFromLocal /home/bhupendra/workspace/sample1.txt /usr/local/hadoop-.2.6.0/input
Change directory to run an example Wordcount program using jar file. NOTE: Don't create output folder out1, it  will be created and every time you run an example, give a new directory. These directories are not visible with ls command in terminal.
hadoop jar wordcount.jar /usr/local/hadoop/input /usr/local/hadoop/output

<< I will fix the above issue latter, as this issue causing unable to run hadoop command to create directory and copyfile from local to hdfs director"input"
to run the programme, I have created input directory using usula mkdir command and copied file using cp command >>

hadoop fs -ls
15/01/30 17:03:49 WARN util.NativeCodeLoader: Unable to load native-hadoop 
ibrary for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
well, your problem regarding ls: '.': No such file or directory' is because there is not home dir on HDFS for your current user. Try
hadoop fs -mkdir -p /user/[current login user]
Then you will be able to hadoop fs -ls
go to hadoop conf path
vi and add following lines

export HADOOP_PREFIX=/usr/local/hadoop-2.6.0
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

< Please note environment variable overrides variables in side .bashrc file. Hence it is mandatory to add above lines in file>

Step 14:
run the job using below command 
hadoop jar WordCount.jar /usr/local/hadoop-2.6.0/input /usr/local/hadoop-2.6.0/output

Step 15
Browse the Hadoop GUI


Step 16:
Browse the output file 

Step 17:
Stop all deamons if you are done with job

hduser@ubuntu:/usr/local/hadoop-2.6.0/etc/hadoop$ hadoop fs -ls hdfs://localhost:54310
Found 1 items
drwxr-xr-x   - hduser supergroup          0 2015-07-17 10:42 hdfs://localhost:54310/user/hduser/input


  1. Please mention which local files should be copied to the input directory.

  2. Map reducing concept is the important one in hadoop.In this blog it explains clearly and easy to understand.
    Besant Technologies Reviews | Besant Technologies Reviews

  3. Its very helpful and simple tutorial ..

  4. Thanks for sharing this article.. You may also refer

  5. Great site for these post and i am seeing the most of contents have useful for my Carrier.Thanks to such a useful information.Any information are commands like to share him.
    CCNA Training in Chennai

  6. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  7. Hello,
    The MapReduce programming model is straightforward, and borrows from the simplicity of functional programming. In the MapReduce programming model, the developer expresses the computation goal as the implementation of two primitive functions: map() and reduce().

  8. It's really a nice experience to read your post. Thank you for sharing this useful information.
    SAP Abap On Hana Training

    Exchange Server Training

  9. An astounding web diary I visit this blog, it's inconceivably magnificent. Strangely, in this current blog's substance made point of fact and sensible. The substance of information is instructive.
    Oracle Fusion Financials Online Training
    Oracle Fusion HCM Online Training
    Oracle Fusion SCM Online Training
    oracle Fusion Technical online training