[hadoop][hive]What are partitions in Hive－FLASHC

What are partitions in Hive

Partitioning tables changes how Hive structures the data storage
在設計資料的物理結構的時候，可以透過 partition 的方式增加處理的效率。
也就是說，我們把同樣的資料放在同樣的一個區塊，意味著，他們存放在底層的hdfs，是在同一個dir，同一個sortfile。

舉一個例子來說，我們的員工資料分別分為各country與各個state來做partition。
CREATE TABLE employees ( name STRING, salary FLOAT, ) PARTITIONED BY (country STRING, state STRING);

我們在hdfs上看的物理結構可能會是存放在
hdfs://master_server/user/hive/warehouse/mydb.db/employees
裡面的資料夾跟files可能是這樣長的
.../employees/country=CA/state=AB .../employees/country=CA/state=BC .../employees/country=US/state=AL .../employees/country=US/state=AK
這樣做有什麼好處呢?

我們在查詢的時候就可以加快查找同一個country與同一個state的速度。

For example, the following query selects all employees in the state of Illinois in the United States:
我們需要找，在Illinois state, US country的員工。
直覺來說，我們就可以馬上找到那個存放records的files是哪一個。
這樣一來我們就不用遍歷所有的tables內的files了。

SELECT * FROM employees WHERE country = 'US' AND state = 'IL';

peicheng

FLASHC

peicheng 發表在痞客邦留言(0) 人氣()

E-mail轉寄

«	三月 2025	»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

三月 2025

日

一

二

三

四

五

六

FLASHC

FLASHC It's time to starting forward. Do what you love. Love what you do.

公告版位

[hadoop][hive]What are partitions in Hive

歷史上的今天

留言列表

月曆

近期文章

文章彙整

文章分類

最新迴響

我的連結

參觀人氣

RSS訂閱

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

«	三月 2025					»
日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31