Hello,
How did you push data into DDFS?
There are two different approaches available:
1. ddfs chunk: which will assume each line is a record. The lines will
be read one by one, a blob of data is created out of these lines and
then pushed into ddfs. When reading the file, the map task will
receive the file line by line (record by record).
2. ddfs push: which basically pushes the data into ddfs.
The latter is the one you probably want to look into.
Although ddfs chunk also accepts a reader which will allow it to have
records with arbitrary rules (e.g. reading xml in
https://github.com/discoproject/disco/blob/develop/examples/util/xml_reader.py)
> --
> You received this message because you are subscribed to the Google Groups
> "Disco-development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
disco-dev+...@googlegroups.com.
> To post to this group, send email to
disc...@googlegroups.com.
> Visit this group at
http://groups.google.com/group/disco-dev.
> For more options, visit
https://groups.google.com/d/optout.