Hadoop tutorials: Hadoop Installation (Pseudo Distributed Mode)

Steps for Installation of Hadoop

If you are windows environment follow below steps

Step 1: Install VMware Workstation

Download Product (http://www.vmware.com/products/workstation/workstation-evaluation)

Step 2: Download any flavor of linux (Ex: RedHat Linux/ubuntu …… if you have low configuration machine use Lubuntu)

Step 3: Start VMware and create a new vm.

Step 4: Install Linux(any flavor)

Step 5: Start Linux command prompt ----terminal

Step 6: Login root directory and create a new group

Step 7: Add a new group

Command : sudo adgroup hadoop

It will ask to root password ….. give root password

Sudo is used if you want use any command as super user(Ex: RH ---)

Step 8: Add a new user for hadoop in group “hadoop”

Command : sudo adduser –ingroup hadoop hduser

It will ask password tht you want to set.
Step 9: Now add hduser in the list of sudoes. That you can run any command in hduser

Command : sudo adduser hduser sudo

Step 10: Now logout root and Login hduser

Step 11: Open terminal

Step 12: Hadoop is developed in Java . Java should be installed in your machine before we start using Hadoop

We need Java 1.5+ (means 1.5 or later versions)

Command : sudo apt-get install openjdk-6-jdk

Apt-get is a package manager of ubuntu . it will help to install software’s.

Step 13: Install SSH server (Secure Socket Layer)

Command : sudo apt-get install openssh-server

Step 14: Once ssh installed we can login to remote machine using following command

Command : ssh<ipaddress>

if you try ssh localhost, you will notice that it will prompt you for password. Now we want to make this login password-less. One way of doing it is to use keys. we can generate keys using following command.

Command : ssh-keygen –t rsa –P “”

This command will generate two keys at "/home/hduser/.ssh/" path. id_rsa and id_rsa.pub.

id_rsa is private key.

id_rsa.pub is publc key

Command : ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@localhost

Give password for hduser.

Step 15: Download Hadoop from Apache website

Step 16: Extract hadoop and put it in folder "/home/hduser/hadoop"

Step 17: Now we need to make configurations in hadoop configuration file. You will find these files in "/home/hduser/hadoop/conf" folder.

Step 18. There are 4 important files in this folder

a) hadoop-env.sh

b) hdfs-site.xml

c) mapred-site.xml

d) core-site.xml

a)hadoop-env.sh is a file contains hadoop environment related properties.

Here we can set java home.

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

b) hdfs-site.xml is file which contains properties related to hdfs.

We need to set here the replication factor here.

By default replication factor is 3.

since we are installing hadoop in single machine.

So we will set it to 1.

<property>

<name>dfs.replication</name>

<value>1</value>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

c) mapred-site.xml is a file that contains properties related to map reduce.

we will set here ip address and port of machine on which job tracker is running

<property>

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

d) core-site.xml is property file which contains property which are common or used by both map reduce and hdfs.

we will set ip address and port number of machine on which namenode will be running.

Other property tells where should hadoop store files like fsimage and blocks etc.

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hduser/hadoop_tmp_files</value>

<description>A base for other temporary directories.</description>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

Hadoop tutorials

Pages

Monday, 17 August 2015

Hadoop Installation (Pseudo Distributed Mode)

No comments:

Post a Comment

About Me