Monday 18 August 2014

Working with Apache Hive on Ubuntu

tar -xvf apache-hive-0.13.1-bin.tar.gz

mv apache-hive-0.13.1-bin hive


editing .bashrc file

hduser2@bala:~$ sudo gedit .bashrc

creating warehouse folder in HDFS

hduser2@bala:~$ hadoop111/bin/hadoop fs -mkdir /home/hduser2/tmp/hive/warehouse

giving read write permissions to warehouse folder

hduser2@bala:~$ hadoop111/bin/hadoop fs -chmod g+w /home/hduser2/tmp/hive/warehouse

Adding hadoop path in hive config file

hduser2@bala:~$ sudo gedit hive0131/bin/hive-config.sh

# Allow alternate conf dir location.
HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf}"

export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
export HADOOP_HOME=/home/hduser2/hadoop111
# Default to use 256MB
export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}

Launch hive

hduser2@bala:~$hive
hduser2@bala:~$ hive

Logging initialized using configuration in jar:file:/home/hduser2/hive0131/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show tables;
OK
Time taken: 0.233 seconds
hive> exit;
hduser2@bala:~$


Hive Commands:

Creating table



Loading data

Inserting data into the data

dropping the table

listing the tables


Updating the table data

deleting the data from the table

UPDATE or DELETE a record isn't allowed in Hive, but INSERT INTO is acceptable.
A snippet from Hadoop: The Definitive Guide(3rd edition):
Updates, transactions, and indexes are mainstays of traditional databases. Yet, until recently, these features have not been considered a part of Hive's feature set. This is because Hive was built to operate over HDFS data using MapReduce, where full-table scans are the norm and a table update is achieved by transforming the data into a new table. For a data warehousing application that runs over large portions of the dataset, this works well.
Hive doesn't support updates (or deletes), but it does support INSERT INTO, so it is possible to add new rows to an existing table.







No comments:

Post a Comment