Problem with incompatible xml parsers?

1,872 views
Skip to first unread message

Bob Silverberg

unread,
May 16, 2010, 8:40:52 AM5/16/10
to javaloa...@googlegroups.com
Hey all,

I'm brand new to Java and JavaLoader so some of what I'm describing
may seem crazy, or just a very bad way of doing it.

I'm trying to run some Java code from within CF that drives WebDriver.
I have a Java project in Eclipse that "works". It contains a /lib
folder with a whole bunch of jar files that I downloaded from
http://code.google.com/p/selenium/ (specifically
selenium-java-2.0a4.zip). In Eclipse I added the jars to my buildpath
and that allowed me to run my code. I have a single java file in the
/src folder that does some simple stuff with WebDriver. Here's the
source of that:

Example.java:

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;

public class Example {

FirefoxDriver driver;
WebElement e;

public Example() {

}

public void tryIt() {

driver = new FirefoxDriver();
System.setProperty("webdriver.firefox.useExisting","true");
driver.get("http://www.google.com");
e = driver.findElement(By.name("q"));
e.sendKeys("Cheese!");
e.submit();
System.out.println("Page title is " + driver.getTitle());
driver.quit();

}

}

Being totally new to Java and JavaLoader, I just tried loading my
source file initially, using the sourceDirectories argument, but of
course Java couldn't find the org.openqa.selenium package. I then
tried loading my /lib folder into JavaLoader using the loadPaths
argument, but that didn't seem to work.

Question 1: Can I just point loadPaths at a folder on disk that
contains a bunch of jar files, or do I have to point it to each
individual jar file? Doing the former didn't seem to work.

I wasn't sure how to proceed at this point, so what I did was use the
Export feature of Eclipse to create a jar file that includes all of
the resources in my /lib folder. I then pointed to that jar file
using loadPaths. I have no idea if that's a reasonable approach and
would be happy to hear of other ways to make this work.

OK, so now the Java source is running and it is able to find the
org.openqa.selenium package, but when I run the code I get the
following exception:

org.openqa.selenium.WebDriverException: java.lang.ClassCastException:
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to
javax.xml.parsers.DocumentBuilderFactory
System info: os.name: 'Mac OS X', os.arch: 'x86_64', os.version:
'10.6.3', java.version: '1.6.0_17'
Driver info: driver.version: firefox
at org.openqa.selenium.firefox.FirefoxProfile.readIdFromInstallRdf(FirefoxProfile.java:224)
at org.openqa.selenium.firefox.FirefoxProfile.addExtension(FirefoxProfile.java:160)
at org.openqa.selenium.firefox.FirefoxProfile.addExtension(FirefoxProfile.java:142)
at org.openqa.selenium.firefox.FirefoxProfile.addWebDriverExtensionIfNeeded(FirefoxProfile.java:102)
at org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:110)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:64)
at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:100)
at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:92)
at Example.tryIt(Example.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at coldfusion.runtime.StructBean.invoke(StructBean.java:502)
at coldfusion.runtime.CfJspPage._invoke(CfJspPage.java:2393)
at cfindex2ecfm43082552.runPage(/Users/robertsilverberg/Documents/workspace/javaloader/example/compileHelloWorld/index.cfm:34)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:363)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:87)
at coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:27)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:74)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
at coldfusion.CfmServlet.service(CfmServlet.java:200)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
Caused by: java.lang.ClassCastException:
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to
javax.xml.parsers.DocumentBuilderFactory
at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source)
at org.openqa.selenium.firefox.FirefoxProfile.readIdFromInstallRdf(FirefoxProfile.java:177)
... 45 more

I'm not sure what this is, but my guess is that it's some sort of
conflict between the code in my jar file and CF's java classes. Any
advice that anyone has on getting past this would be appreciated.

Oh, and here's the CF code that runs all of this:

<cfscript>
paths[1] = expandPath("./lib/cfTest.jar");
sourcePaths = [expandPath("./src")];
loader = createObject("component",
"javaloader.JavaLoader").init(loadPaths=paths,sourceDirectories=sourcePaths);
example = loader.create("Example").init();
example.tryIt();
</cfscript>

Thanks for any help with this!

Cheers,
Bob


--
Bob Silverberg
www.silverwareconsulting.com

Hands-on ColdFusion ORM Training
www.ColdFusionOrmTraining.com

Dennis Clark

unread,
May 16, 2010, 1:31:27 PM5/16/10
to javaloa...@googlegroups.com
Hey Bob,

I believe JavaLoader simply converts the loadPaths argument into a Java classpath. From the official documentation (http://java.sun.com/javase/6/docs/technotes/tools/windows/classpath.html) you can append '/*' to a Java classpath directory to include all jar files in that directory. I'm not certain that JavaLoader supports this, but it's worth a shot.

The class javax.xml.parsers.DocumentBuilderFactory mentioned in your error is part of the standard Java library (rt.jar) but also appears in the xml-apis-1.3.04.jar included in Selenium. It's possible that your Eclipse build decided to not include that class in your exported jar. Normally this is not a problem, but I know Java classloaders can have problems with classes whose definition depends on classes in a parent loader. It's also possible that the version of DocumentBuilderFactory in xml-apis-1.3.04.jar is slightly different to the standard Java library version, making them incompatible for the classloader.

In either case, you should be able to resolve the problem by adding xml-apis-1.3.04.jar to your JavaLoader loadPaths.

Let us know which of my 2 ideas (if any) work.

Good luck,

-- Dennis

Bob Silverberg

unread,
May 16, 2010, 8:45:47 PM5/16/10
to javaloa...@googlegroups.com
Hi Dennis,

On Sun, May 16, 2010 at 1:31 PM, Dennis Clark <boom...@gmail.com> wrote:
> Hey Bob,
> I believe JavaLoader simply converts the loadPaths argument into a Java
> classpath. From the official documentation
> (http://java.sun.com/javase/6/docs/technotes/tools/windows/classpath.html) you
> can append '/*' to a Java classpath directory to include all jar files in
> that directory. I'm not certain that JavaLoader supports this, but it's
> worth a shot.

I tried that and it didn't work. In fact JavaLoader throws an error:

/Users/robertsilverberg/Documents/workspace/javaloader/example/compileHelloWorld/lib/selenium-jars/*
does not exist

which makes sense. Looking at the source code it looks like I do in
fact have to point to each and every jar file individually in the
loadPaths array. Perhaps it's my ignorance of Java, but why does it
have to be that way? The documentation seems to suggest that I can
point to a folder that contains jar files, and it would be easy enough
to have JavaLoader traverse those files, so I'm surprised that it
doesn't work like that.

> The class javax.xml.parsers.DocumentBuilderFactory mentioned in your error
> is part of the standard Java library (rt.jar) but also appears in
> the xml-apis-1.3.04.jar included in Selenium. It's possible that your
> Eclipse build decided to not include that class in your exported jar.

Nope. I can see the class files from that jar file in my exported jar.

> Normally this is not a problem, but I know Java classloaders can have
> problems with classes whose definition depends on classes in a parent
> loader. It's also possible that the version of DocumentBuilderFactory
> in xml-apis-1.3.04.jar is slightly different to the standard Java library
> version, making them incompatible for the classloader.

It's sounding like this is probably the issue, and therefore I'm still
at a loss as to how to address it.

> In either case, you should be able to resolve the problem by
> adding xml-apis-1.3.04.jar to your JavaLoader loadPaths.
> Let us know which of my 2 ideas (if any) work.

But (as mentioned above) the classes are already in my exported jar.
I also, just for kicks, added a path to the actual xml-apis-1.3.04.jar
file to my loadPaths array, so I could be sure that it was loading,
and that is still resulting in the same error message.

I appreciate the help, but thus far I seem to still be stuck where I was before.

Mark Mandel

unread,
May 16, 2010, 9:18:17 PM5/16/10
to javaloa...@googlegroups.com

If you have to point it at each .jar file. See:
http://www.compoundtheory.com/javaloader/docs/#Reference_9143739175724688_407

If you are using CF9, see:
http://www.cfquickdocs.com/cf9/#directorylist

For a quick way to get a list of all the .jar files in a directory.
 

I wasn't sure how to proceed at this point, so what I did was use the
Export feature of Eclipse to create a jar file that includes all of
the resources in my /lib folder.  I then pointed to that jar file
using loadPaths. I have no idea if that's a reasonable approach and
would be happy to hear of other ways to make this work.

OK, so now the Java source is running and it is able to find the
org.openqa.selenium package, but when I run the code I get the
following exception:

org.openqa.selenium.WebDriverException: java.lang.ClassCastException:
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to
javax.xml.parsers.DocumentBuilderFactory
System info: os.name: 'Mac OS X', os.arch: 'x86_64', os.version:
'10.6.3', java.version: '1.6.0_17'


So this is where ClassLoaders in Java can bite you in the rear.. and can be a pain to debug if you don't understand how they work.

Strangely enough, I actually covered this in my cf.Objective() Advanced Java Integration talk, with exactly this library, and it could probably do with a good blog post too, because not a lot of CF'ers really get what happens here.

We've also talked about this a bit on this list as well, but it's a hard thing to explain without pictures.

Basically, you have to think of Classloaders (of which JL has one, and CF also has, well, a lot) like Shelves, but Shelves, that can turn around and say 'Hey, what you are looking for, is not on my shelve, maybe you should check the Shelf above me'.

So whenever you go get an instance of something, the code goes to the given ClassLoader, finds the reference to the Class it looks for and then creates an instance out of that Class definition.  If the ClassLoader can't find it, it goes and asks its parent. What makes this whole experience fun, is that some ClassLoaders are "parent first" (Does my parent have it? If not, check me!), or "child first" (Do I have it, if not, check my parent), and you don't necessarily know which.

JavaLoader is Child First, just for reference.

For the issue you are getting, 2 rather painful things are happening:

1) JavaLoader's ClassLoader will always point to the ColdFusion's System ClassLoader - which has a copy of dom4j, just like your JL library does.  Normally this is no issue, as Jl is child first, so it doesn't hit the parent classloader, except...

2) The ThreadContextClassLoader.
The of the ThreadContextClassLoader as a ClassLoader that is stuck to the current Thread.  It's usually responsible for loading whatever Classes you need to create your Java objects when you request them, unless you specifically use another ClassLoader (like JL's).

What makes this painful, is dom4j's implementation does some wacky stuff to get/set singletons via the ThreadContextClassLoader (which IMHO is a bad implementation, when considering multiple classloaders). So what happens is you go:

JLClassLoader - create dom4j
dom4j - hey, TCCL (ThreadContextClassLoader) - create me a DocumentBuilderFactoryImpl
dom4j - Let's use my new DocumentBuilderFactoryImpl
Java - wait a minute, you created a DocumentBuilderFactoryImpl via the TCCL, but you want to do stuff in JLClassLoader, which has a different DocumentBuilderFactoryImpl... GAH I don't know which is which!

Which is why you get this weird error:

Caused by: java.lang.ClassCastException:
org.apache.xerces.jaxp.
DocumentBuilderFactoryImpl cannot be cast to
javax.xml.parsers.DocumentBuilderFactory

Basically because the two classes come from different ClassLoaders (which is a very bad thing), and it realises they are different, and can't make them the same.

So what do you do? Well, the cool thing is, you can switch out the TCCL at run time, to a different ClassLoader, and then switch it back after you are done.  It's the cleanest thing in the world, and dom4j should really give you a way to tell it which classLoader to use (Spring does this), but you do what you have to.

So, assuming you are on CF9, you would end up writing something like this:

<cfscript>
       _Thread = createObject("java", "java.lang.Thread"); //using 'Thread' breaks CFB
       currentClassloader = _Thread.currentThread().getContextClassLoader();

       try
       {


              paths[1] = expandPath("./lib/cfTest.jar");
               sourcePaths = [expandPath("./src")];
               loader = createObject("component", "javaloader.JavaLoader").init(loadPaths=paths,sourceDirectories=sourcePaths);

               //set the current thread's context class loader as Javaloader's classloader, so dom4j doesn't die
                _Thread.currentThread().setContextClassLoader(javaloader.getURLClassLoader());


               example = loader.create("Example").init();
               example.tryIt();
       }
       catch(Any exc)
       {
              rethrow;
       }
       finally
       {
             /*
                 We have to reset the classloader, due to
                  thread pooling.
             */
             _Thread.currentThread().setContextClassLoader(currentClassloader);
       }
</cfscript>

So as you can see, we:
1) get access to the current thread, and grab it's current ClassLoader
2) switch the TCCL to the ClassLoader JavaLoader uses
3) Once we are done loading up what we need, we use finally{} to make sure the TCCL goes back to the way it was before we started.

I hope that all made sense, let me know if it didn't.

ClassLoaders in Java are hugely powerful, but they can also suck.  This is why I often say JL isn't a very good tool for learning Java in, because of crazy issues like these.

Mark




--
E: mark....@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

Bob Silverberg

unread,
May 16, 2010, 9:43:47 PM5/16/10
to javaloa...@googlegroups.com
Thanks Mark. That worked perfectly, including the suggestion to use
directoryList()!

I have one more question why is it so important to set the TCCL back?
Is that because CF is going to do something later in the request
internally (i.e., not based on code that I wrote) and it needs the
TCCL set back? Or is it because this change to the TCCL is not only
going to affect this request? Is the thread reused with other
requests?

Cheers,
Bob
--
Bob Silverberg
www.silverwareconsulting.com

Dennis Clark

unread,
May 16, 2010, 9:49:35 PM5/16/10
to javaloa...@googlegroups.com
On Sun, May 16, 2010 at 9:18 PM, Mark Mandel <mark....@gmail.com> wrote:

For the issue you are getting, 2 rather painful things are happening:

1) JavaLoader's ClassLoader will always point to the ColdFusion's System ClassLoader - which has a copy of dom4j, just like your JL library does.  Normally this is no issue, as Jl is child first, so it doesn't hit the parent classloader, except...

2) The ThreadContextClassLoader.
The of the ThreadContextClassLoader as a ClassLoader that is stuck to the current Thread.  It's usually responsible for loading whatever Classes you need to create your Java objects when you request them, unless you specifically use another ClassLoader (like JL's).

What makes this painful, is dom4j's implementation does some wacky stuff to get/set singletons via the ThreadContextClassLoader (which IMHO is a bad implementation, when considering multiple classloaders).


Thanks for the explanation Mark. I had a sense of this but didn't have a grasp of the finer details or the workaround.

I had an issue last year with trying to move the client jars for an RMI API from ColdFusion's server classpath to JavaLoader. RMI has its own classloader so it can load API classes from the network if needed (I wasn't using that feature though). I was able to load the API's top-level remote object with JavaLoader, but any calls that required it to instantiate other API objects would fail with a ClassNotFoundException.

From your explanation, I suspect that while the core RMI objects used the JavaLoader's classloader to load the remote object's class, the remote object itself was using the TCCL instead of its own classloader to load these secondary classes. Unfortunately the application makes heavy use of the API, manipulating the TCCL so frequently in so many places seems too risky.

The only real benefit I was hoping to get from JavaLoader was the ability to upgrade the API jar without having to restart the CF server. In fact I had to do that last week for a back-end upgrade. It turns out though that a 20-second outage is not a big deal when the application has to be taken offline for 3 hours for the back-end to be upgraded, so I'm just going to leave things as-is.

Cheers,

-- Dennis

Mark Mandel

unread,
May 16, 2010, 9:51:40 PM5/16/10
to javaloa...@googlegroups.com
So CF runs on a J2EE engine (normally JRUN), and J2EE engines will Thread pool.

Basically, they reuse threads.

There is no guarentee they will reset the TCCL on every request (in fact, I'd be surprised if they did).. so if you leave the TCCL the way it was, then weird stuff will start to happen on other requests, as the chaned TCCL comes around to other requests that don't know it's been changed.

An interesting side note - createObject() uses the TCCL to create instances of Java classes when you do createObject("java", "...")

Mark

Mark Mandel

unread,
May 16, 2010, 9:54:05 PM5/16/10
to javaloa...@googlegroups.com
Yeah, unfortunately there aren't that many Java people out there dealing with multiple classloader issues out there - as, well, they don't really need to worry about it.

Usually you have to dig into the code at the required point and go 'okay, what classpath is this accessing' and work things out from there.

Any Java library that is accessing a classLoader directly should enable you a way to overwrite which classpath it is using, but not all of them do.

(Or they should access the classloader from themselves - i.e. which classLoader loaded me)

Mark

 
Thanks for the explanation Mark. I had a sense of this but didn't have a grasp of the finer details or the workaround.

I had an issue last year with trying to move the client jars for an RMI API from ColdFusion's server classpath to JavaLoader. RMI has its own classloader so it can load API classes from the network if needed (I wasn't using that feature though). I was able to load the API's top-level remote object with JavaLoader, but any calls that required it to instantiate other API objects would fail with a ClassNotFoundException.

From your explanation, I suspect that while the core RMI objects used the JavaLoader's classloader to load the remote object's class, the remote object itself was using the TCCL instead of its own classloader to load these secondary classes. Unfortunately the application makes heavy use of the API, manipulating the TCCL so frequently in so many places seems too risky.

The only real benefit I was hoping to get from JavaLoader was the ability to upgrade the API jar without having to restart the CF server. In fact I had to do that last week for a back-end upgrade. It turns out though that a 20-second outage is not a big deal when the application has to be taken offline for 3 hours for the back-end to be upgraded, so I'm just going to leave things as-is.

Cheers,

-- Dennis

Reply all
Reply to author
Forward
0 new messages