hadoop－FLASHC｜痞客邦

Dec 24 Tue 2013 23:47
[hadoop]聖誕禮物 Hortonworks Certified Apache Hadoop Administrator 1.x Got

Assessment: Hortonworks Certified Apache Hadoop Administrator 1.x
Date Completed: 2013/12/24
Result: Pass

聖誕禮物 Got
今天考馬上就知道結果了。
Hortonworks Certified Apache Hadoop Administrator～～

peicheng 發表在痞客邦留言(1) 人氣()

個人分類：hadoop

▲top

Dec 24 Tue 2013 11:02
[hadoop]Hortonworks Certified Apache Hadoop 1.x Administrator

Hadoop 1.0 Administrator Certification
References for Certification Candidates
Intended Audience
The Certified Apache Hadoop Administrator certification is intended for IT administrators and operators who deploy, manage and monitor Hadoop-based solutions, consultants who create Hadoop project proposals and Hadoop administration instructors. Those certified are recognized as having high level of skill in Apache Hadoop administration

Exam Format
The Certified Apache Hadoop 2.x exam consists of 41 open response and multiple-choice questions. The exam is delivered in English.

Practice Exams
Certification candidates may take two practice exams at no charge. Register at the certification site.

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Dec 12 Thu 2013 14:57
[hadoop]Hbase 啟動流程

Hbase 啟動流程

開啟一個 Hbase cluster 可以按造以下流程開啟

HDFS
DataNodes
HBase HMaster (active)
HBase HMaster (backup)
HBase Region Servers

確保資料的可靠與系統健全

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Oct 16 Wed 2013 17:32
[hadoop] mapreduce 新舊版本 org.apache.hadoop.mapred vs org.apache.hadoop.mapreduce

note一下，

0.20 前使用 org.apache.hadoop.mapred 舊介面
0.20 版本開始引入org.apache.hadoop.mapreduce 的新API

0.20 後使用
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html

@InterfaceAudience.Public
@InterfaceStability.Stable

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Oct 11 Fri 2013 17:07
[hadoop]MapFile

MapFile 是排序且帶索引的 hadoop SequenceFile 。
一個 MapFile 在 HDFS上是一個資料夾，包含兩個file組成，一個是index，也就是key的索引，另外一個就是 data，排序好的原始資料。
在查找時，只需要把index載入，memory中，使用binary search的方式，就可以很快查找到要找的key。

index
內含
# hadoop fs -text numbers.map/index

1 128
129 5820

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Oct 08 Tue 2013 15:02
[hadoop] ambari HDP default jobtracker namenode port

IPC port
JobTracker:8021
namenode:8020

JobTracker WebUI : 50030
NameNode WebUI 50070

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Oct 02 Wed 2013 14:59
[hive] left semi join

Tutorial - Apache Hive - Apache Software Foundation
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Joins

Hive 的說明內只出了Joins的幾種組合用法，
其中有個是 left semi join

In order check the existence of a key in another table, the user can use LEFT SEMI JOIN as illustrated by the following example.

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Sep 27 Fri 2013 13:44
[hadoop] commission and decommission

[hadoop] commission and decommission Step

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Sep 27 Fri 2013 13:03
[hadoop] job scheduler

[hadoop] job scheduler

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Aug 30 Fri 2013 13:15
[hadoop]intermediate Sort

[hadoop] intermediate Sort
keyword:spill index , spill files, MapReduce

Goal: Sort by key

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Aug 14 Wed 2013 16:19
[hbase]hbase hdfs system fstab config

in /etc/fstab
最佳化的設定值為
/dev/sd1 /data ext3 defaults,noatime 0 0

noatime 為不update 讀取更新時間

使用noatime优化Linux文件系统读取性能 | 飛飛's Blog
http://webcache.googleusercontent.com/search?q=cache:3kd3x2TACRcJ:m114.org/to-use-noatime-optimized-linux-file-read-performance/+&cd=5&hl=zh-TW&ct=clnk&gl=tw&lr=lang_zh-CN%7Clang_zh-TW&client=firefox-beta
fcamel 技術隨手記: atime, noatime 和 relatime
http://fcamel-life.blogspot.tw/2010/12/atime-noatime-relatime.html

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jul 22 Mon 2013 11:57
[hadoop] pig bulk load data into hbase error log tmp

bash-4.1$ pig -useHCatalog simple.bulkload.pig
2013-07-19 21:23:40,293 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
2013-07-19 21:23:40,294 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hdfs/pc_test/pig_1374240220291.log
2013-07-19 21:23:40,588 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://dmp-hadoop-m1.dmp:8020
2013-07-19 21:23:40,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: dmp-hadoop-m2.dmp:50300
2013-07-19 21:23:41,301 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2013-07-19 21:23:41,496 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
2013-07-19 21:23:41,573 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
Details at logfile: /home/hdfs/pc_test/pig_1374240220291.log
bash-4.1$ vim /home/hdfs/pc_test/pig_1374240220291.log

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jul 16 Tue 2013 14:57
[hadoop] Hadoop Summit, San Jose - June 26-27, 2013 slides and video and about hive

Hadoop Summit, San Jose - June 26-27, 2013
http://hadoopsummit.org/san-jose/schedule/

已經有上傳slides跟影片可以看了。

幾個hive相關的sessions

- Simplifying Use of Hive with the Hive Query Tool
http://www.slideshare.net/Hadoop_Summit/scaffidi-june26-405pmroom212
Simplifying Use of Hive with the Hive Query Tool - YouTube

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jul 03 Wed 2013 14:12
[hadoop]Exceeded limits on number of counters More than 120 counters in hadoop

"You can override that property in mapred-site.xml on your JT, TT, client nodes but make sure that this will be a system-wide modification:

<configuration>

  ...

  <property>

    <name>mapreduce.job.counters.limit</name>

    <value>500</value>

  </property>

  ...

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jun 19 Wed 2013 18:09
[hadoo]HAR harfilesystem

HarFileSystem.java in hadoop-common | source code search engine
http://searchcode.com/codesearch/view/10576495

HarFileSystem (Hadoop 1.1.2 API)
http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/fs/HarFileSystem.html

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jun 19 Wed 2013 17:32
[hadoop] hadoop hdfs file premissions 檔案權限

- hdfs內的檔案權限管理與一般的Unix system 類似。
- 都有使用 rwx ，但是hdfs幾乎不會直接執行程式，所以 x 幾乎不會用到。
- 預設上使用目前 local system 的 username 為讀取 hdfs file的 user。
- 所以，如果user name與 hdfs上的file設定相同就有操作的權限。

比較特別的地方，
- hdfs上的 group mapping 是由 namenode上的 group 去判定的。
"HDFS stores the user and group of a file or directory as strings; there is no conversion from user and group identity numbers as is conventional in Unix."

如果把 dfs.premissions 改成 false 將不會檢查權限。

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jun 19 Wed 2013 16:56
[hadoop]HDFS Federation

- namenode 會在記憶體內，保存 file的metadata，所以，記憶體的限制就是cluster的限制。
- 允許增加新的 namenode，各自管理自己的檔案一部分。

HDFS Federation(HDFS 联盟)介绍 - 张贵宾的技术专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/strongerbit/article/details/7013221

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jun 18 Tue 2013 22:44
[hadoop]why not use RAID in hadoop system 為什麼不用 RAID

1. 因為HDFS 設計本身就會做資料副本
2. RAID0 的效率又比 JBOD (Just a bunch of disk) 來的差，
JBOD 的平均運作速度會優於最慢的硬碟

3. RAID 在其中一個硬碟出問題，會導致整個陣列都不能使用。

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

Jun 18 Tue 2013 17:54
[hbase] hbase shell example

hbase shell example
put,get filter scan

tobe write

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

使用 hive over hbase ,或者單純使用hbase時，

"WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0.24.jar!/hive-log4j.properties
Hive history file=/tmp/webtest/hive_job_log_webtest_201306120215_407815782.txt
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator "

可能會一直停著不動不啟動 MapReduce的Job

(繼續閱讀...)

peicheng 發表在痞客邦留言(0) 人氣()

個人分類：hadoop

▲top

FLASHC

FLASHC It's time to starting forward. Do what you love. Love what you do.

公告版位

目前分類：hadoop (43)

[hadoop]聖誕禮物 Hortonworks Certified Apache Hadoop Administrator 1.x Got

[hadoop]Hortonworks Certified Apache Hadoop 1.x Administrator

[hadoop]Hbase 啟動流程

[hadoop] mapreduce 新舊版本 org.apache.hadoop.mapred vs org.apache.hadoop.mapreduce

[hadoop]MapFile

[hadoop] ambari HDP default jobtracker namenode port

[hive] left semi join

[hadoop] commission and decommission

[hadoop] job scheduler

[hadoop]intermediate Sort

[hbase]hbase hdfs system fstab config

[hadoop] pig bulk load data into hbase error log tmp

[hadoop] Hadoop Summit, San Jose - June 26-27, 2013 slides and video and about hive

[hadoop]Exceeded limits on number of counters More than 120 counters in hadoop

[hadoo]HAR harfilesystem

[hadoop] hadoop hdfs file premissions 檔案權限

[hadoop]HDFS Federation

[hadoop]why not use RAID in hadoop system 為什麼不用 RAID

[hbase] hbase shell example

[hbase] hbase hive zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

月曆

近期文章

文章彙整

文章分類

最新迴響

我的連結

參觀人氣

RSS訂閱

«	七月 2025					»
日	一	二	三	四	五	六
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

«	七月 2025					»
日	一	二	三	四	五	六
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

«	七月 2025					»
日	一	二	三	四	五	六
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31