Monday, 17 August 2015

Hadoop Installation (Pseudo Distributed Mode)

Steps for Installation of Hadoop
If you are windows environment follow below steps
Step 1: Install VMware Workstation
                Download Product (http://www.vmware.com/products/workstation/workstation-evaluation)
Step 2: Download any flavor of linux (Ex: RedHat Linux/ubuntu …… if you have low configuration machine use Lubuntu)
Step 3: Start VMware and create a new vm.
Step 4: Install Linux(any flavor)
Step 5: Start Linux command prompt ----terminal
Step 6: Login root directory and create a new group
Step 7: Add a new group
Command  :  sudo adgroup hadoop
It will ask to root  password ….. give root password
Sudo is used if you want use any command as super user(Ex: RH ---)
Step 8: Add a new user for hadoop in group “hadoop”
Command : sudo adduser –ingroup hadoop hduser
It will ask password tht you want to set.
Step 9: Now add hduser  in the list of sudoes. That you can run any command in hduser
Command :  sudo adduser hduser sudo
Step 10: Now logout root and Login hduser
Step 11: Open terminal
Step 12: Hadoop is developed in Java . Java should be installed in your machine before we start using Hadoop
                We need Java 1.5+ (means 1.5 or later versions)
Command :         sudo apt-get install openjdk-6-jdk

Apt-get is a package manager of ubuntu . it will help to install software’s.
Step 13: Install SSH server (Secure Socket Layer)
Command :         sudo apt-get install openssh-server
Step 14:  Once ssh installed we can login to remote machine using following command
Command :         ssh<ipaddress>
if you try ssh localhost, you will notice that it will prompt you for password. Now we want to make this login password-less. One way of doing it is to use keys. we can generate keys using following command.
Command :         ssh-keygen –t rsa –P “”
This command will generate two keys at "/home/hduser/.ssh/" path. id_rsa and id_rsa.pub.
id_rsa is private key.
id_rsa.pub is publc key
Command :         ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@localhost
Give password for hduser.







Step 15: Download Hadoop from Apache website
Step 16: Extract hadoop and put it in folder "/home/hduser/hadoop"

Step 17: Now we need to make configurations in hadoop configuration file. You will find these files in "/home/hduser/hadoop/conf" folder.

Step 18. There are 4 important files in this folder

     a) hadoop-env.sh
     b) hdfs-site.xml
     c) mapred-site.xml
     d) core-site.xml
a)hadoop-env.sh is a file contains hadoop environment related properties.
Here we can set java home.
                export  JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

b) hdfs-site.xml is file which contains properties related to hdfs.
 We need to set here the replication factor here.
 By default replication factor is 3.
 since we are installing hadoop in single machine.
So  we will set it to 1.

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

c) mapred-site.xml is a file that contains properties related to map reduce.
we will set here ip address and port of machine on which job tracker is running
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

d) core-site.xml is property file which contains property which are common or used by both map reduce and hdfs.
 we will set ip address and port number of machine on which namenode will be running.
 Other property tells where should hadoop store files like fsimage and blocks etc.
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/hadoop_tmp_files</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

No comments:

Post a Comment