Monday, February 16, 2015

Hadoop Single Node Setup

System requirement:


1. mkdir /usr/local/hadoop-2.6.0

2. cd /usr/local/hadoop-2.6.0

3. wget http://mirror.metrocast.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

4. tar -xzfv hadoop-2.6.0.tar.gz

5. add new user
 
     $ usergroup hadoop
     $ useradd -g hadoop hduser
 to change primary group
usermod -g primarygrpname username
to change secondary group
usermod -G secondarygrpname username

6. Install ssh-server
  $ apt-get install openssh-server

7. generate ssh key
$ su - hduser
$ ssh-key gen
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh hduser@localhost






  • Disabling IPv6
  • Open config file: sudo gedit /etc/sysctl.conf
  • Add these 3 lines at the end of the file: 
  • #disable ipv6; net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
    • after adding the following code, reload the settings  using-  source  ~/.bashrc  and   source  ~/.profile

    • Configuring hadoop Configuration file
            Change directory using cd /usr/local/hadoop/etc/hadoop
       
        $ vi yarn-site.xml
         
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>
    </configuration
    • Create direcotry as below
    • Format file system 
    • cd /usr/local/hadoop-2.6.0
    • ./hadoop namenode -format

    • Go to sbin and start all demons
    • cd  /usr/local/hadoop-2.6.0/sbin
    • $ ./start-all.sh
    • to check if all demons are running
    • $jps

    If any daemon doesn't start, start them manually
        hadoop-daemon.sh start namenode
        hadoop-daemon.sh start datanode
        yarn-daemon.sh start resourcemanager
        yarn-daemon.sh start nodemanager
        mr-jobhistory-daemon.sh start historyserver

        Hadoop Web Interfaces.
            Namenode - http://localhost:50070/
            Secondary Namenode - http://localhost:50090
            Most important is jps. Use jps to check which daemons are running.


    $chown -R hduser:hadoop /usr/local/hadoop-2.6.0
    $chmod +x -R /usr/local/hadoop-2.6.0
    Setting Global Variable
    $ vi /home/hduser/.bashrc
    export HADOOP_PREFIX=/usr/local/hadoop-2.6.0
    export HADOOP_HOME=/usr/local/hadoop-2.6.0
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    # Native Path
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    #Java path
    export JAVA_HOME='/usr/lib/jvm/java-7-oracle'
    # Add Hadoop bin/ directory to PATH

    export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

    $vi /home/hduser/.profile
    export HADOOP_PREFIX=/usr/local/hadoop-2.6.0
    export HADOOP_HOME=/usr/local/hadoop-2.6.0
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    # Native Path
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    #Java path
    export JAVA_HOME='/usr/lib/jvm/java-7-oracle'
    # Add Hadoop bin/ directory to PATH

    export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

    $/usr/local/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
    export JAVA_HOME=/usr/lib/jvm/java-7-oracle



         <configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    </configuration>

    $vi core-site.xml
    <configuration>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/usr/local/hadoop-2.6.0/tmp</value>
      <description>A base for other temporary directories.</description>
    </property>

    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:54310</value>
      <description>The name of the default file system.  A URI whose
      scheme and authority determine the FileSystem implementation.  The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class.  The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
    </property>

    </configuration>

    $  vi mapred-site.xml
    <configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:54311</value>
      <description>The host and port that the MapReduce job tracker runs
      at.  If "local", then jobs are run in-process as a single map
      and reduce task.
      </description>
    </property>

    </configuration>

    $ vi hdfs-site.xml
    mkdir -p $HADOOP_HOME/yarn_data/hdfs/datenode
    mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode

    Offline Image Viewer Guide

    -rw-r--r-- 1 hduser hadoop  100722 Oct  7 20:49 fsimage_0000000000000008804
    -rw-r--r-- 1 hduser hadoop      62 Oct  7 20:49 fsimage_0000000000000008804.md5
    drwxrwxr-x 3 hduser hduser    4096 Oct  8 22:49 ..
    -rw-r--r-- 1 hduser hadoop  100722 Oct  8 22:49 fsimage_0000000000000008805
    -rw-r--r-- 1 hduser hadoop      62 Oct  8 22:49 fsimage_0000000000000008805.md5
    -rw-rw-r-- 1 hduser hduser     202 Oct  8 22:49 VERSION
    -rw-r--r-- 1 hduser hadoop       5 Oct  8 22:49 seen_txid
    -rw-r--r-- 1 hduser hadoop 1048576 Oct  8 22:49 edits_inprogress_0000000000000008806
    drwxrwxr-x 2 hduser hduser   12288 Oct  8 22:49 .
    hduser@ubuntu:/usr/local/hadoop-2.6.0/tmp/dfs/name/current$ cat fsimage_0000000000000008805.md5
    929bde84fb1432baba3228dc78b3b6d8 *fsimage_0000000000000008805
    hduser@ubuntu:/usr/local/hadoop-2.6.0/tmp/dfs/name/current$ hdfs oiv -i fsimage_0000000000000008805
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loading 2 strings
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loading 1273 inodes.
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loading inode references
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode references
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loading inode directory section
    15/10/08 23:02:31 INFO offlineImageViewer.FSImageHandler: Loaded 164 directories
    15/10/08 23:02:31 INFO offlineImageViewer.WebImageViewer: WebImageViewer started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
    15/10/08 23:04:27 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=GETFILESTATUS target=/user/hduser
    15/10/08 23:04:27 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser
    15/10/08 23:04:51 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=GETFILESTATUS target=/user/hduser
    15/10/08 23:04:51 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser
    15/10/08 23:05:41 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=GETFILESTATUS target=/user/hduser
    15/10/08 23:05:42 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser
    15/10/08 23:05:42 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser/input
    15/10/08 23:05:42 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser/input1
    15/10/08 23:05:42 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser/input2
    15/10/08 23:05:42 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/user/hduser/input3
    15/10/08 23:06:47 INFO offlineImageViewer.FSImageHandler: 200 method=GET op=LISTSTATUS target=/



    6 comments:

    1. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
      Regards,
      cognos Training Chennai|cognos Training|cognos tm1 Training in Chennai

      ReplyDelete
    2. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
      Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

      ReplyDelete
    3. Maharashtra Police Patil Recruitment 2016

      Good Post, I’ll bookmark your blog and take a look at again right here regularly......

      ReplyDelete
    4. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
      Ethical hacking course in Chennai | Ethical hacking training in chennai

      ReplyDelete
    5. Informative post indeed, I’ve being in and out reading posts regularly and I see alot of engaging people sharing things and majority of the shared information is very valuable and so, here’s my fine read.
      click here gif icon
      click here gift
      click here got em
      click here gif buttonclick here
      visit here

      ReplyDelete