Hadoop tutorials: HIVE

Introduction

Data ware housing tool on top of Hadoop
SQL like interface
Provides SQL like language to analyze the data stored on HDFS
Can be used by people who know SQL
Not all traditional SQL capabilities are supported
Under the hood hive queries are executed as MapReduce jobs
No extra work is required

Hive Components

MetaStore
Its a database consisting of table definitions and other metadata
By default stored on the local machine on derby database
It can be kept on some shared machine like relational data base if multiple users are using

Query Engine

Hive-QL which gives SQL like query
Internally Hive queries are run as map reduce job

Hive Data Models

Hive forns or layers table definitions on top of data residing on HDFS

Databases

Name space that separates tables from other units from naming confliction

Table

Homogenous unit of data having same schema

MetaStore

Its a data base consisting of table definations and other metadata. By default stored on the local machine on derby database. It can be kept on some shared machine like relational data base if multiple users are using.

Before you start installing hive, you should have already installed hadoop.

See Hadoop Installation

Step 1: Download Hive from apache website (check with comparability with hadoop version)

Step 2: Go to downloads and extract hive.tar.gz

Step 3: Copy the extracted jar into /home/hduser/hive.

Step 4: Edit /etc/bash.bashrc

$gedit bash.bashrc

$sudo gedit /etc/bash.bashrc

$sudo leafpad /etc/bash.bashrc

Step 5: Set Home Path

export HADOOP_HOME=/home/hduser/hadoop

export HIVE_HOME=/home/hduser/hive

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HIVE_HOME/bin

Step 6: Save and Close

Step 7: Open terminal and enter hive

Now you can enter hive shell.

Hadoop tutorials

Pages

HIVE

No comments:

Post a Comment

About Me