How to query xpath on context (subtree only)

52 views
Skip to first unread message

Martin Grotzke

unread,
Oct 13, 2013, 6:54:04 PM10/13/13
to scale...@googlegroups.com
Hi,

I'm trying to evaluate an xpath expression on a given context (an xpath
previously evaluated, but don't know if/how this is possible.

Given the document
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head/>
<body><div id="content"><div class="main">foo</div></div></body>
</html>

when I evaluate the xpath
"//body//div[@id='content']"
as context and afterwards evaluate the xpath
"//*[@class='main']/text()" using this context,
this fails with
RuntimeException: The local name 'xml:lang' is not valid for Xml 1.0
which indicates, that the 2nd xpath expression is run against the whole
document.

I know that I could prefix the 2nd xpath expression with the 1st one
(string concatenation), but I'd prefer not to do this.
Also this shows another issue: that xml:lang causes the error. If
there's a solution for that one it would also be great.

Here's the complete sample I'm running (with scales-jaxen 0.6.0-M1):

import scala.xml.Source

import nu.validator.htmlparser.common.XmlViolationPolicy
import nu.validator.htmlparser.sax.HtmlParser
import scales.utils.resources.SimpleUnboundedPool
import scales.utils.top
import scales.xml._
import scales.xml.ScalesXml._
import scales.xml.jaxen.ScalesXPath
import scales.xml.parser.sax.DefaultSaxSupport

object ScalesTest extends App {
val source = Source.fromString("""
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en"><head/><body><div id="content"><div
class="main">foo</div></div></body></html>
""")

val doc = loadXmlReader(source, strategy = defaultPathOptimisation,
parsers = NuValidatorFactoryPool)
val root = top(doc)

// This works (prints "foo")
println(
ScalesXPath("//body//div[@id='content']//*[@class='main']/text()")
.withNameConversion(ScalesXPath.localOnly)
.evaluate(root).head.right.get.item.value
)

val ctxt = ScalesXPath("//body//div[@id='content']")
.withNameConversion(ScalesXPath.localOnly).xmlPaths(root).head

// Fails with
// Exception in thread "main" java.lang.RuntimeException: The local
name 'xml:lang' is not valid for Xml 1.0
println(
ScalesXPath("//*[@class='main']/text()")
.withNameConversion(ScalesXPath.localOnly)
.evaluate(ctxt).head.right.get.item.value
)

}

object NuValidatorFactoryPool extends
SimpleUnboundedPool[org.xml.sax.XMLReader] with DefaultSaxSupport {
def create = {

import nu.validator.htmlparser.{ sax, common }
import sax.HtmlParser
import common.XmlViolationPolicy

val reader = new HtmlParser
reader.setXmlPolicy(XmlViolationPolicy.ALLOW)
reader.setXmlnsPolicy(XmlViolationPolicy.ALLOW)
reader
}
}

TIA && cheers,
Martin

Chris Twiner

unread,
Oct 14, 2013, 11:59:32 AM10/14/13
to scale...@googlegroups.com

Hi Martin,

When using 0.4.5 you can simply ask for a relative path. Instead of "/" use "./" as the starting point.

This keeps the actual parent path intact. If that's not what you are wanting then you can call top on the subtree to start a new root.

Hth,
Cheers,
Chris

--
You received this message because you are subscribed to the Google Groups "scales-xml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scales-xml+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Martin Grotzke

unread,
Oct 14, 2013, 4:41:03 PM10/14/13
to scale...@googlegroups.com
Hi Chris,

great, this helps! Is there an easy solution for the xml:lang issue?

Thanx && cheers,
Martin


On 10/14/2013 05:59 PM, Chris Twiner wrote:
> Hi Martin,
>
> When using 0.4.5 you can simply ask for a relative path. Instead of "/"
> use "./" as the starting point.
>
> This keeps the actual parent path intact. If that's not what you are
> wanting then you can call top on the subtree to start a new root.
>
> Hth,
> Cheers,
> Chris
>
> On Oct 14, 2013 9:34 AM, "Martin Grotzke" <martin....@googlemail.com
> <mailto:scales-xml%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google
> Groups "scales-xml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scales-xml+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
inoio gmbh - http://inoio.de
Schulterblatt 36, 20357 Hamburg
Amtsgericht Hamburg, HRB 123031
Geschäftsführer: Dennis Brakhane, Martin Grotzke, Ole Langbehn

signature.asc
Reply all
Reply to author
Forward
0 new messages