I want to create train set from multiple files using hadoop.
So i want to know which is best to do it - Pig latin, Hive, HBase.
Structures and number of records belonging to it are dynamic and i want to combine those files by calculating ratios to be taken from each file.
any detail provided will be appreciated..