Sunday 6 December 2015

HIVE FACTS

HIVE is a big data tool used widely. Its also misunderstood widely.
Here are some facts about HIVE:
1. HIVE is a datawarehouse:
 HIVE is a data warehouse not a database. It gives you the capability to put your data (your big data, usually historical data) into files and then analyze it, slice and dice it, drill down, roll up; basically anything except manipulating it.


2. HIVE works in batch mode:
HIVE queries are similar to SQL but their processing is much different than SQL. HIVE queries get converted into MapReduce jobs. It means it takes atleast the amount of time it takes to create a job to run you query (unless its select *).


3. HIVE is not RDBMS
HIVE has a query language very similar to SQL but that doesn't mean it acts as RDBMS. It lacks many of the important features of RDBMS, like:
  • Transactions: HIVE doesn't have transactions. It means it has no guarantee that your queries will be atomic, so they can fail at any point leaving your system in inconsistent state.
  • No locks: There are no locks on tables. That means it is possible that two queries are manipulating your table (a whole partition or a whole table) simultaneously and hence, giving wrong results.
WARNING: Never use HIVE for transactional purposes


4. HIVE is not SQL:
  • Amount of I/O: HIVE doesn't have tables, it only works on files. It can't read a part of your table, it'll always read the whole file. It doesn't matter whether you want to read one record, or a hundred, the amount of data read is same. 
  • Processing time: If you read the whole "table", processing time is almost none. All it does is read the whole file and return it according to table's schema. But the processing time goes on increasing with each clause in the query.

5. HIVE uses HDFS to store data:
It means if you access a large table (there are solutions for that), you have to account the network time too alongwith I/O time.


There are a lot of features lacking in HIVE, but that doesn't mean that HIVE is useless. It just means its uses are different.
If you have some questions or you have some points where HIVE is different (or lacking), do comment here.

To find out more about HIVE, stay tuned... :)

No comments:

Post a Comment