hadoop - Hive index rebuild too slow in compare with PostgreSQL -
I am trying to compare the same functionality on my PostgreSQL data warehouse and the new one on the same box on the same box I am trying to understand the hive data warehouse and the same table structure I understand the hive benefits, but ... despite the fact that Data Load is running 3 times in postgresqual - Index build / rebuild on PostGrace SQL To fold The index hive do not need to create every time. My question is, what am I missing in the hive configuration?
My setup is: '' Status / Data / Spaces / Hadoop / hadoopfs '' set by ROW FORMAT DELIMITED FIELDS; Make Table Tablet (AR ANAR, BB String);
Load data local INPATH / data / Informix94 / spaces / postgres / myfile_big 'Overlay in table matrix;
Index mytable_indx 'org.apache as Current Table Matabable (AA) Thoop Hv.kill.inx.com Compact index handler with 'defrade rebuild space' / data / spaces / halaps / thophis';
Set Hive OPTIMATE. Otto indices = true; Set hive.aptimize.inx.filter = true;
Turn on mytable_indx to rebuild mytable_indx;
My box is VGM with 3G RAM on which PostgreSQL is running and 1GB RAM. They are serving as a metadata store. I am using the most recent stable versions of CentOS, Hadoop, Hive and have not been able to change the hive's default setting except to disable matadata store location and statistics.
Result: Index reconstruction takes 260,000,000 rows or 8098 seconds on 5000,000 lines for 80 seconds.
The hive works well only when your data does not fit on one machine. So the result you are seeing is the expected outcome, so once you have gathered the terabytes or petabytes data, you will be very happy with the hive. In the case of usage you would describe PostgreSQL would be a better match.
Comments
Post a Comment