information about hdfs

1 view

Skip to first unread message

glenhzheng

unread,

Apr 8, 2013, 10:17:12 AM4/8/13

to cloudxy cloudxy

This is the information I get today, for this messages will help some one to optimize the hlfs I think

1:HDFS's highest access mode is write once and times of reading. The time delay for whole data cluster is more significant than the delay of getting the first record .

2:HDFS is fit for low time delay data access .

3:The number of files stored in HDFS is limited by the memory of namenode.But less than billion is ok

4:Only one write is allowed .

5: the Block of HDFS is 64MB.The file will content several blocks which create several chunk, less one block will not have room of 64MB! Mostly we will use 128MB as one block.

6: The copes of one block is more than 3.

7:The performance of dfs.replication will control the replications of block . default is 3

8:Read from Hadoop URL should set the buffer, and also we can set whether close the input data stream.(4KB as usual )
9:For the balance of HDFS, set 20 map fiction per node to run distcp(copy one hadoop files to other hadoop )

The problems1:Whether we should apply FUSE(Filesystem in Userspace )

Your sincerely

Glen.Zheng

harryxiyou

unread,

Apr 9, 2013, 8:57:47 PM4/9/13

to clo...@googlegroups.com

我们目前还不用FUSE. 你可以先试着用CDH4搭建完全分布式环境，搭建过程中
你将会碰到一系列的参数配置问题。
CDH4 docs are here.
http://www.cloudera.com/content/support/en/documentation/cdh4-documentation/cdh4-documentation-v4-latest.html

--
Thanks
Harry Wei

Reply all

Reply to author

Forward

0 new messages