This is the information I get today, for this messages will help some one
to optimize the hlfs I think
1:HDFS's highest access mode is write once and
times of reading. The time delay for whole data cluster is more significant than
the delay of getting the first record .
2:HDFS is fit for low time delay data access .
3:The number of files stored in HDFS is limited by the memory of
namenode.But less than billion is ok
4:Only one write is allowed .
5: the Block of HDFS is 64MB.The file will content several blocks
which create several chunk, less one block will not have room of 64MB! Mostly we
will use 128MB as one block.
6: The copes of one block is more than 3.
7:The performance of dfs.replication will control the replications of block
. default is 3
8:Read from Hadoop URL should set the buffer, and also we can set whether
close the input data stream.(4KB as usual )
9:For the balance of HDFS,
set 20 map fiction per node to run distcp(copy one hadoop files to other hadoop
)
The problems1:Whether we should apply FUSE(Filesystem in Userspace )
Your sincerely
Glen.Zheng