Steps for Installation of Hadoop
If you are windows environment
follow below steps
Step 1: Install VMware Workstation
Download
Product (http://www.vmware.com/products/workstation/workstation-evaluation)
Step 2: Download any flavor of linux
(Ex: RedHat Linux/ubuntu …… if you have low configuration machine use Lubuntu)
Step 3: Start VMware and create a
new vm.
Step 4: Install Linux(any flavor)
Step 5: Start Linux command prompt
----terminal
Step 6: Login root directory and
create a new group
Step 7: Add a new group
Command : sudo adgroup hadoop
It will ask to root password ….. give root password
Sudo is used if you want use any
command as super user(Ex: RH ---)
Step 8: Add a new user for hadoop
in group “hadoop”
Command : sudo adduser –ingroup hadoop hduser
It will ask password tht you want
to set.
Step 9: Now add hduser in the list of sudoes. That you can run any command in hduser
Step 9: Now add hduser in the list of sudoes. That you can run any command in hduser
Command : sudo adduser hduser sudo
Step 10: Now logout root and Login
hduser
Step 11: Open terminal
Step 12: Hadoop is developed in Java
. Java should be installed in your machine before we start using Hadoop
We
need Java 1.5+ (means 1.5 or later versions)
Command : sudo apt-get install
openjdk-6-jdk
Apt-get is a package manager of ubuntu . it will help to install
software’s.
Step 13: Install SSH server (Secure
Socket Layer)
Command : sudo apt-get install
openssh-server
Step 14: Once ssh installed we can login to remote
machine using following command
Command : ssh<ipaddress>
if you try ssh localhost, you will
notice that it will prompt you for password. Now we want to make this login
password-less. One way of doing it is to use keys. we can generate keys using
following command.
Command : ssh-keygen –t rsa –P “”
This command will generate two keys
at "/home/hduser/.ssh/" path. id_rsa and id_rsa.pub.
id_rsa is private key.
id_rsa.pub is publc key
Command : ssh-copy-id -i
/home/hduser/.ssh/id_rsa.pub hduser@localhost
Step 15: Download Hadoop from
Apache website
Step 16: Extract hadoop and put it
in folder "/home/hduser/hadoop"
Step 17: Now we need to make
configurations in hadoop configuration file. You will find these files in
"/home/hduser/hadoop/conf" folder.
Step 18. There are 4 important
files in this folder
a) hadoop-env.sh
b) hdfs-site.xml
c) mapred-site.xml
d) core-site.xml
a)hadoop-env.sh is a file contains
hadoop environment related properties.
Here we can set java home.
export
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
b) hdfs-site.xml is file which
contains properties related to hdfs.
We need to set here the replication factor
here.
By default replication factor is 3.
since we are installing hadoop in single
machine.
So we will set it to 1.
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The
actual number of replications can be specified when the file is created.
The
default is used if replication is not specified in create time.
</description>
</property>
c) mapred-site.xml is a file that
contains properties related to map reduce.
we will set here ip address and
port of machine on which job tracker is running
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then
jobs are run in-process as a single map
and
reduce task.
</description>
</property>
d) core-site.xml is property file
which contains property which are common or used by both map reduce and hdfs.
we will set ip address and port number of
machine on which namenode will be running.
Other property tells where should hadoop store
files like fsimage and blocks etc.
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop_tmp_files</value>
<description>A base for other temporary
directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.
The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
No comments:
Post a Comment