How to make sure data blocks are shared between 2 datanodes

15 views
Skip to first unread message

sindhu hosamane

unread,
May 24, 2014, 2:54:12 AM5/24/14
to hadoop-user-...@googlegroups.com
Hello Friends, 

I am running  multiple datanodes on a single machine .

The output of jps command shows 
Namenode       Datanode     Datanode     Jobtracker     tasktracker        Secondary Namenode

Which assures that 2 datanodes are up and running .I execute cascalog queries on this 2 datanode hadoop cluster  , And i get the results of query too.
I am not sure if it is really using both datanodes . ( bcoz anyways i get results with one datanode )

(read somewhere about HDFS storing data in datanodes like below )
1)  A HDFS scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold. 
2)  Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. 

My doubts are :
* Do i have to make any configuration changes in hadoop to tell it to share datablocks between 2 datanodes or does it do automatically .
* Also My test data is not too big . its only 240 KB . According to point 1) i don't know if such small test data can initiate automatic movement of  data from one datanode to another .
* Also what should dfs.replication  value be when i am running 2 datanodes  ?  (i guess its 2 )


Any advice or help would be very much appreciated .

Best Regards,
Sindhu
Reply all
Reply to author
Forward
0 new messages