You can use standard HDFS APIs to read files. How to read data is totally free for applications. In DataMPI, we have provided some util functions to read data. We would like to share some example here. It may help your applications also.
// O communicator
int rank = MPI_D.Comm_rank(MPI_D.COMM_BIPARTITE_O);
int size = MPI_D.Comm_size(MPI_D.COMM_BIPARTITE_O);
if (rank == 0) {
DataMPIUtil.printArgs(args);
}
System.out.println("The O task " + rank + " of " + size
+ " is working...");
Path[] inputs = DataMPIUtil.HDFSDataLocalLocator.getTaskInputs(
MPI_D.COMM_BIPARTITE_O, jobConf, inDir, rank, size);
for (int i = 0; i < inputs.length; i++) {
Path inPath = inputs[i];
FileSystem fs = inPath.getFileSystem(jobConf);
if (fs.exists(inPath) && fs.isFile(inPath)) {
FileStatus status = fs.getFileStatus(inPath);
FileSplit fsplit = new FileSplit(inPath, 0,
status.getLen(), jobConf);
KeyValueLineRecordReader kvrr;
kvrr = new KeyValueLineRecordReader(jobConf, fsplit);
Text khead = kvrr.createKey();
Text vhead = kvrr.createValue();
while (kvrr.next(khead, vhead)) {
// send key-value
MPI_D.Send(khead, vhead);
...
Thanks,
DataMPI Team