Read XMI Cas from SequenceFile

15 views
Skip to first unread message

Ivan Habernal

unread,
Mar 6, 2014, 11:06:20 AM3/6/14
to dkpro-big...@googlegroups.com
Hi all, is there any example/snippet how to convert (export) data that are produced by dkpro-bigdata to some standard format that can be later used outside of Hadoop? I experimented a little with parsing SequenceFile but I wasn't sure this is the right way... Thanks! Ivan

Hans-Peter Zorn

unread,
Mar 7, 2014, 7:57:18 AM3/7/14
to dkpro-big...@googlegroups.com
Hi,
this should work something along the lines (untested):
   SequenceFile.Reader reader = new SequenceFile.Reader(conf,
            Reader.file(seqFilePath));

    Text key = new Text();
    CasWritable val = new BinCasWithTypesystemWritable();

    while (reader.next(key, val)) {
        System.err.println(key + "\t" + val.getCAS().getDocumentText());
    }

-hp
Am 06.03.2014 um 17:06 schrieb Ivan Habernal <ivan.h...@gmail.com>:

Hi all, is there any example/snippet how to convert (export) data that are produced by dkpro-bigdata to some standard format that can be later used outside of Hadoop? I experimented a little with parsing SequenceFile but I wasn't sure this is the right way... Thanks! Ivan

--
You received this message because you are subscribed to the Google Groups "dkpro-bigdata-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-bigdata-u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hans-Peter Zorn

unread,
Mar 7, 2014, 3:42:49 PM3/7/14
to dkpro-big...@googlegroups.com
Hi Ivan,
Ah, I just saw that you said "XMI Cas" in the subject. If your SequenceFiles are stored with CasWritable (which writes
XMI), you will need to use that instead of BinCasWithTypesystemWritable..However, the latter is the default, so if you
didn't explicitly use CASWritable in you MapReduce job, thats what most likely is in your sequencefiles.

-hp


Am Freitag, 7. März 2014 13:57:18 UTC+1 schrieb Hans-Peter Zorn:
Hi,
this should work something along the lines (untested):
   SequenceFile.Reader reader = new SequenceFile.Reader(conf,
            Reader.file(seqFilePath));

    Text key = new Text();
    CasWritable val = new BinCasWithTypesystemWritable();

    while (reader.next(key, val)) {
        System.err.println(key + "\t" + val.getCAS().getDocumentText());
    }

-hp
Am 06.03.2014 um 17:06 schrieb Ivan Habernal:

Hi all, is there any example/snippet how to convert (export) data that are produced by dkpro-bigdata to some standard format that can be later used outside of Hadoop? I experimented a little with parsing SequenceFile but I wasn't sure this is the right way... Thanks! Ivan

--
You received this message because you are subscribed to the Google Groups "dkpro-bigdata-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-bigdata-users+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages