close
[hive][apache] hive basic instruction
hive的簡介
緣起
- hadoop core 的介紹
- hdfs
- MapReduce 特點與架構
發展
- facebook 發展
- Qubole
The Qubole Data Service (QDS) is a Software-as-a-Service analytics platform running on leading cloud offerings like AWS.
- 使用的公司
* baidu
* facebook
* qubole
使用比例
為什麼使用hive的轉變
tips
- 對於同一個表使用多個查詢
(Making Multiple Passes over the Same Data)
The following rewrite achieves the same thing, but using a single pass through the source history table
HDFS was designed for many millions of large files, not billions of small files
Each partition corresponds to a directory that usually contains multiple files.
MapReduce processing converts a job into multiple tasks.
Another solution is to use two levels of partitions along different dimensions. For ex- ample, the first partition might be by day and the second-level partition might be by geographic region, like the state:
The primary reason to avoid normalization is to minimize disk seeks, such as those typically required to navigate foreign key relations
when you have 10s of terabytes to many petabytes of data, optimizing speed makes these limitations worth accepting.
hive的簡介
緣起
- hadoop core 的介紹
- hdfs
- MapReduce 特點與架構
發展
- facebook 發展
- Qubole
The Qubole Data Service (QDS) is a Software-as-a-Service analytics platform running on leading cloud offerings like AWS.
- 使用的公司
* baidu
* qubole
使用比例
為什麼使用hive的轉變
tips
- 對於同一個表使用多個查詢
(Making Multiple Passes over the Same Data)
The following rewrite achieves the same thing, but using a single pass through the source history table
HDFS was designed for many millions of large files, not billions of small files
Each partition corresponds to a directory that usually contains multiple files.
MapReduce processing converts a job into multiple tasks.
Another solution is to use two levels of partitions along different dimensions. For ex- ample, the first partition might be by day and the second-level partition might be by geographic region, like the state:
The primary reason to avoid normalization is to minimize disk seeks, such as those typically required to navigate foreign key relations
when you have 10s of terabytes to many petabytes of data, optimizing speed makes these limitations worth accepting.
文章標籤
全站熱搜
留言列表