Hi
I am new to Hadoop, just learnt Map Reduce. Have below doubts regarding it.
1) Does Map Reduce always does processing in (Key , Value ) pairs? I general any real time problem can be solved with (Key, Value) pair?
2)HDFS splits input file to multiple blocks. Suppose the file is split into 10 blocks, and then the output of mapper ( Key, Value) is converted to "Intermediate Data" ( which is sorted and shuffled) and then passed to reducer for aggregation.
What component is responsible for "Intermediate Data"
3) In case if 9 blocks have converted the data to (Key, Value) and 10th block is taking long time, will the sorting and shuffling be done on 9 block outputs or they are going to wait? This might occur generally because the block size is not same for all blocks.
4) Read HDFS is write once and read many times. Normally in DWH or ETL concepts we insert new records per each day into the table. In similar fashion, how can we achieve this in HDFS, because we cannot modify the file once loaded into HDFS. Do we need to add new records to our file and then upload to HDFS deleting the old file?
Sorry if I have asked too much, Thanks in advance.