Assessment: Hortonworks Certified Apache Hadoop Administrator 1.x
Date Completed: 2013/12/24
Result: Pass
聖誕禮物 Got
今天考馬上就知道結果了。
Hortonworks Certified Apache Hadoop Administrator~~
公告版位
目前分類:hadoop (43)
- Dec 24 Tue 2013 23:47
[hadoop]聖誕禮物 Hortonworks Certified Apache Hadoop Administrator 1.x Got
- Dec 24 Tue 2013 11:02
[hadoop]Hortonworks Certified Apache Hadoop 1.x Administrator
Hadoop 1.0 Administrator Certification
References for Certification Candidates
Intended Audience
The Certified Apache Hadoop Administrator certification is intended for IT administrators and operators who deploy, manage and monitor Hadoop-based solutions, consultants who create Hadoop project proposals and Hadoop administration instructors. Those certified are recognized as having high level of skill in Apache Hadoop administration
Exam Format
The Certified Apache Hadoop 2.x exam consists of 41 open response and multiple-choice questions. The exam is delivered in English.
Practice Exams
Certification candidates may take two practice exams at no charge. Register at the certification site.
- Dec 12 Thu 2013 14:57
[hadoop]Hbase 啟動流程
Hbase 啟動流程
開啟一個 Hbase cluster 可以按造以下流程開啟
- HDFS
- DataNodes
- HBase HMaster (active)
- HBase HMaster (backup)
- HBase Region Servers
確保資料的可靠與系統健全
- Oct 16 Wed 2013 17:32
[hadoop] mapreduce 新舊版本 org.apache.hadoop.mapred vs org.apache.hadoop.mapreduce
note一下,
0.20 前使用 org.apache.hadoop.mapred 舊介面
0.20 版本開始引入org.apache.hadoop.mapreduce 的新API
0.20 後使用
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html
@InterfaceAudience.Public
@InterfaceStability.Stable
- Oct 11 Fri 2013 17:07
[hadoop]MapFile
MapFile 是 排序且帶索引的 hadoop SequenceFile 。
一個 MapFile 在 HDFS上是一個資料夾,包含兩個file組成,一個是index,也就是key的索引,另外一個就是 data,排序好的原始資料。
在查找時,只需要把index載入,memory中,使用binary search的方式,就可以很快查找到要找的key。
index
內含
# hadoop fs -text numbers.map/index
1 128
129 5820
- Oct 08 Tue 2013 15:02
[hadoop] ambari HDP default jobtracker namenode port
IPC port
JobTracker:8021
namenode:8020
JobTracker WebUI : 50030
NameNode WebUI 50070
- Oct 02 Wed 2013 14:59
[hive] left semi join
Tutorial - Apache Hive - Apache Software Foundation
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Joins
Hive 的說明內只出了Joins的幾種組合用法,
其中有個是 left semi join
In order check the existence of a key in another table, the user can use LEFT SEMI JOIN as illustrated by the following example.
- Sep 27 Fri 2013 13:44
[hadoop] commission and decommission
[hadoop] commission and decommission Step
- Sep 27 Fri 2013 13:03
[hadoop] job scheduler
[hadoop] job scheduler
- Aug 30 Fri 2013 13:15
[hadoop]intermediate Sort
[hadoop] intermediate Sort
keyword:spill index , spill files, MapReduce
Goal: Sort by key
- Aug 14 Wed 2013 16:19
[hbase]hbase hdfs system fstab config
in /etc/fstab
最佳化的設定值為
/dev/sd1 /data ext3 defaults,noatime 0 0
noatime 為 不update 讀取 更新時間
使用noatime优化Linux文件系统读取性能 | 飛飛's Blog
http://webcache.googleusercontent.com/search?q=cache:3kd3x2TACRcJ:m114.org/to-use-noatime-optimized-linux-file-read-performance/+&cd=5&hl=zh-TW&ct=clnk&gl=tw&lr=lang_zh-CN%7Clang_zh-TW&client=firefox-beta
fcamel 技術隨手記: atime, noatime 和 relatime
http://fcamel-life.blogspot.tw/2010/12/atime-noatime-relatime.html
- Jul 22 Mon 2013 11:57
[hadoop] pig bulk load data into hbase error log tmp
bash-4.1$ pig -useHCatalog simple.bulkload.pig
2013-07-19 21:23:40,293 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
2013-07-19 21:23:40,294 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hdfs/pc_test/pig_1374240220291.log
2013-07-19 21:23:40,588 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://dmp-hadoop-m1.dmp:8020
2013-07-19 21:23:40,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: dmp-hadoop-m2.dmp:50300
2013-07-19 21:23:41,301 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2013-07-19 21:23:41,496 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
2013-07-19 21:23:41,573 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
Details at logfile: /home/hdfs/pc_test/pig_1374240220291.log
bash-4.1$ vim /home/hdfs/pc_test/pig_1374240220291.log
- Jul 16 Tue 2013 14:57
[hadoop] Hadoop Summit, San Jose - June 26-27, 2013 slides and video and about hive
Hadoop Summit, San Jose - June 26-27, 2013
http://hadoopsummit.org/san-jose/schedule/
已經有上傳slides跟影片可以看了。
幾個hive相關的sessions
- Simplifying Use of Hive with the Hive Query Tool
http://www.slideshare.net/Hadoop_Summit/scaffidi-june26-405pmroom212
Simplifying Use of Hive with the Hive Query Tool - YouTube
- Jul 03 Wed 2013 14:12
[hadoop]Exceeded limits on number of counters More than 120 counters in hadoop
"You can override that property in mapred-site.xml on your JT, TT, client nodes but make sure that this will be a system-wide modification:
<configuration>
...
<property>
<name>mapreduce.job.counters.limit</name>
<value>500</value>
</property>
...
- Jun 19 Wed 2013 18:09
[hadoo]HAR harfilesystem
HarFileSystem.java in hadoop-common | source code search engine
http://searchcode.com/codesearch/view/10576495
HarFileSystem (Hadoop 1.1.2 API)
http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/fs/HarFileSystem.html
- Jun 19 Wed 2013 17:32
[hadoop] hadoop hdfs file premissions 檔案權限
- hdfs內的檔案權限管理與一般的Unix system 類似。
- 都有使用 rwx ,但是hdfs幾乎不會直接執行 程式 ,所以 x 幾乎不會用到。
- 預設上使用 目前 local system 的 username 為 讀取 hdfs file的 user。
- 所以,如果user name與 hdfs上的file設定 相同就有操作的權限。
比較特別的地方,
- hdfs上的 group mapping 是由 namenode上的 group 去判定的。
"HDFS stores the user and group of a file or directory as strings; there is no conversion from user and group identity numbers as is conventional in Unix."
如果把 dfs.premissions 改成 false 將不會檢查權限。
- Jun 19 Wed 2013 16:56
[hadoop]HDFS Federation
- namenode 會在記憶體內,保存 file的metadata,所以,記憶體的限制就是cluster的限制。
- 允許增加 新的 namenode,各自管理自己的 檔案一部分。
HDFS Federation(HDFS 联盟)介绍 - 张贵宾的技术专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/strongerbit/article/details/7013221
- Jun 18 Tue 2013 22:44
[hadoop]why not use RAID in hadoop system 為什麼不用 RAID
1. 因為HDFS 設計本身就會做資料副本
2. RAID0 的效率又比 JBOD (Just a bunch of disk) 來的差 ,
JBOD 的平均 運作速度 會優於最慢的硬碟
3. RAID 在其中一個硬碟出問題,會導致整個陣列都不能使用。
- Jun 18 Tue 2013 17:54
[hbase] hbase shell example
hbase shell example
put,get filter scan
tobe write
- Jun 12 Wed 2013 02:17
[hbase] hbase hive zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
使用 hive over hbase ,或者單純使用hbase時,
"WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0.24.jar!/hive-log4j.properties
Hive history file=/tmp/webtest/hive_job_log_webtest_201306120215_407815782.txt
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator "
可能會一直停著不動不啟動 MapReduce的Job