XML Parser Example

312 views
Skip to first unread message

kyl

unread,
Mar 4, 2008, 3:18:52 PM3/4/08
to cascading-user
Hi

I've been trying to get Cascading to work with XML files and can't
seem to get it to read and then output or get the XPathParser to work.

Do you have any examples that walk you through an XML sample set?
That use the XPathParser? Anything at all that parses XML files?

I.e anything or any example even as trivial as reading an XML file and
parsing for its XPath and its term to put in an output file on the
hadoop file system. Kinda like the inverted index samples you see
with Hadoop.

I just need an example on how it works with XML files to get a feel
then I can continue on.

Thanks

kyl

unread,
Mar 4, 2008, 3:25:00 PM3/4/08
to cascading-user
Background:

I am able to successfully write and code a map/reduce directly that
parses XML files and creates an inverted index using the Java Hadoop
programming. I had to write my own XML Record Readers, etc using the
old Java Hadoop way. It was a pain in the butt.

I would now like to play around with Cascading to do the exact same
thing that might make the map/reduce transparent and make life easier
than having to think in map/reduce. I saw that there are some
supporting XPathParsers available, but I am unable to get it to work.
I'd like to see an example on how to use it to parse an XML file.

A trivial example for an inverted word index would be easy to follow
or anything else that reads XML files then outputs anything involved
whether that be XPath and term or anything else.

I'd just like to see a working example to try to work with to get my
stuff to work.

Thanks.

Chris K Wensel

unread,
Mar 4, 2008, 3:39:14 PM3/4/08
to cascadi...@googlegroups.com
Hey Kyl

Here is a simple example:
http://code.google.com/p/cascading/wiki/CrawlDataWordCountCascade

Note that you will need a custom Scheme that knows to read a whole
file into a Tuple. The example above stuffs an HTML doc into each line
of the input file. Converts the HTML to XHTML, then does XML stuff.

Unfortunately I haven't been doing any XML processing, so I haven't
written a new Scheme object. Your Scheme class would need to set a
FileInputFormat that reads the whole file or something. sounds like
you may have built this.

Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/

kyl

unread,
Mar 4, 2008, 9:40:03 PM3/4/08
to cascading-user
Awesome. Thanks. I'll start working this in and see how it goes.
This is some cool stuff you did !

Chris K Wensel

unread,
Mar 4, 2008, 9:50:54 PM3/4/08
to cascadi...@googlegroups.com
Thanks!

btw, sorry for the late reply. I sent the email out this morning, but
it got hung up somewhere. you probably won't get this one till
tomorrow <grin>

Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/

balajee venkatesh

unread,
Dec 12, 2015, 1:03:19 PM12/12/15
to cascading-user
Hello !
I guess you would have done some XML processings using Cascading Framework by now. I have tried my best to find any precise input/reference on Internet which could have pushed me working with my codes to effectively parse the XML file to tabular format but didn't find any relevant information/example anywhere. Can you facilitate me a single example ?
Reply all
Reply to author
Forward
0 new messages