I'm trying to read MediaWiki XML format and it starts like this:
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
Then under the tag there are a bunch of tags, some of which have a redirect tag such as:
<page>
<title>Albigensian</title>
<redirect title="Catharism" />
<revision>
...
</revision>
</page>
I'm using ScalesXML to do the parsing:
object WikiMediaImport extends App with Logging {
val xml = pullXml(new FileReader(args(0)))
val ns = Namespace("http://www.mediawiki.org/xml/export-0.8/")
val p = ns // .prefixed("mediawiki") <-- that doesn't help either
val mediawikiTag = p("mediawiki")
val pageTag = p("page")
val titleTag = p("title")
val revisionTag = p("revision")
val textTag = p("text")
val timestampTag = p("timestamp")
val redirectTag = p("redirect")
//val redirectWhereAttr: Attribute = Attribute(redirectTag, "title")
val pagePath = List(mediawikiTag, pageTag)
val iterator = iterate(pagePath, xml)
for {
page <- iterator
} {
val title = text(page \* titleTag)
val timestamp = text(page \* revisionTag \* timestampTag)
val content = text(page \* revisionTag \* textTag)
println(s"$title $timestamp ${content.length}")
}
}
However, I also want to get the mediawiki -> page -> redirect[title] attribute value and I'm not quite sure how to do this despite reading the help page.
If I get a prefixed Namespace, then nothing is found because in the file the namespace isn't actually prefixed. If I use NoNamespaceQName then nothing is found (presumably because in reality the XML file has a namespace specified).
And if I use a default Namespace then Scales doesn't allow me to define an attribute because those are only to be used with prefixed namespaces.
At least that's how I understand that.
Regards,Hiya,
I've answered on so but I just wanted to thank you for reminding me to deprecate the less than correct attribute predicates.
Cheers
Chris
--
You received this message because you are subscribed to the Google Groups "scales-xml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scales-xml+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.