Am 24.12.2014 um 13:45 schrieb Geoff Reedy:
> I guess maybe I could say more about what caused the class not found
> problem, it might make things more clear. I am writing an application that
> is a front-end tool for various functions. I cannot put all of the
> libraries used in this application in the same class loader because of some
> conflicts between libraries, so I use class loaders to maintain separation
> between incompatible components. I have the applications base class loader,
> let's call it B, and class loaders for the application modules, let's call
> them M1 and M2. M1 and M2 have B as their parent class loader. Snakeyaml is
> in class loader B because it is used by some libraries that are in B. A
> library in M1 also uses snakeyaml which, because of the hierarchy, is
> loaded from class loader B. When the library in M1 uses snakeyaml, it wants
> to load instances of its own classes, but when snakeyaml uses
> Class.forName, only class loader B is searched so the library's classes
> cannot be found. Class.forName uses the class loader of the caller (in this
> case B) to search for classes.
That's a classic library conflict.
Using two different class loaders is one way out. As you just
discovered, this is prone to class-not-found problems. It will get worse
if classes with the same name but from different class loaders start to
interact (such as when handing an object created using one class loader
to an object that belongs to another class loader). It's going to be a
world of pain, and Class.toString() not giving you any hint of which
class loader the class is from isn't exactly helpful either.
The other way out would be to split the program and run it in two
separate JVMs (let them communicate through a socket or named pipe or
whatever works best). SnakeYaml is already in place and can do the
serializing and deserializing for you.
Alternatively, you could make sure that no object from classloader A is
ever passed into B's realm, and use SnakeYaml to generate a
classloader-neutral representation (just class names, no ties to
classloaders) and let the B side decode things.
Though serializing and deserializing data is not always a transparent
operation. E.g. handles to open disk files can't be usefully
deserialized if in a separate process, and it's somewhat fragile even
within the same JVM (the deserialized copy won't get closed if the
original File object is closed). Some frameworks like to keep references
to ClassLoader or Class objects, and these would likely deserialize
incorrectly or malfunction later.
Third option would be to transfer the data using DTOs. This makes sure
that you have full control over what gets serialized.
Fourth option, and the best if you can use it, is Maven Shade (there's a
Gradle equivalent I think, though it wasn't quite as flexible as the
original last time I looked, which has been over a year so I guess it
got much better since then). Shade will resolve conflicts by moving
classes to different packages (you get to configure what gets renamed),
and fixing up all the class files from dependencies that reference the
renamed file.
It can run into massive trouble, of course - reflection that looks up
classes by hardcoded package name will fail (or find the wrong class
with a 50% probability - renaming *both* conflicting packages will avoid
that possibility, giving you a better smoke test).
If you don't use Maven or Gradle, this isn't going to be workable I fear.
>>> It
>>> seems like you can make all the common uses work easily with no extra
>>> configuration by using
>>> Thread.currentThread().getContextClassLoader().loadClass(className)
>> instead
>>> of Class.forName(className) throughout the snakeyaml code.
>>
>> I'd have expected the two calls to be equivalent.
>> ... I see, the thread creator can provide a different class loader for a
>> thread. In fact if the thread creator does not do this,
>> Thread.currentThread().getContextClassLoader() will return null.
>
> Well, it's not part of the thread constructor, but can be set at any time
> via Thread#setContextClassLoader. I use this call to set the context class
> loader to M1 (or M2 as the case may be) before executing the functionality
> provided by that module. I'm not sure where you get the idea that
> getContextClassLoader will return null.
From the OpenJDK sources. Javadoc and Java code.
The class member is never set except in setContextClassLoader, and the
Javadoc explicitly mentions the possibility of "not set".
>> So... not so easily actually, SnakeYaml would have to go up the parent
>> thread chain to find a context class loader.
>
> Each thread starts out with the context class loader of its parent set as
> its own context class loader,
Ah, I wasn't aware of that.
Anyway. The OpenJDK code explicitly safeguards against the possiblity of
having a null value there. I don't know how or when that might happen,
though.
> so I can't see why you would have to do that.
> Besides it would absolutely be the wrong thing to do in some situations.
You shouldn't walk up the chain once you have a non-null CCL, yeah.
>> I haven't checked whether you're guaranteed to find a non-null context
>> class loader that way, or to what class loader to fall back to if there
>> is no such guarantee.
>
> I think if the current thread's context class loader is null, you should
> just fall back to Class.forName.
I'd rather recommend throwing an exception.
It's a situation that we don't know how it ever could happen, so we
can't test our assumptions about whether that idea is valid or not.
It might also be reasonable to use
> Class.forName when the desired class cannot be found in the context class
> loader.
It should use the SnakeYaml's class loader, and fail normally if it
doen't work.
In library code, it's much better to fail than to do something without
knowing what you're doing - the crash will alert the team to a situation
that wasn't properly covered, and also provide a situation where the
assumptions can be tested.
I have seen too much code that was written on assumptions and duct tape,
which would fail unexpectedly and miserably once you did something even
slightly off the road. Let's not even start going down that road.
> Another thing, it might be a good idea to explicitly use
> Class.forName if a snakeyaml class is being loaded (i.e. the class
> name starts with snakeyaml's package).
In those cases where the docs and/or the JDK code clearly indicate that
Class.forName is The Right Thing To Do(tm), then okay.
For the situation you name here, it would actually be a horrible idea.
E.g. the class might have been renamed using Maven Shade, we might not
even be aware of the class loaders involved (imagine an Uber Jar loaded
inside a JEE container) and what class they'd provide.
Besides, if somebody tries to deserialize SnakeYaml itself, they'll get
what they deserve. The SnakeYaml machinery is not serializable, and
that's by design.
>> SnakeYaml should automatically use whatever class loading
>> machinery is set up within that thread.
>
> Rather it would, if snakeyaml were using the context class loader. As it
> is, it only looks in the class loader that loaded snakeyaml.
Are you 100% sure that Class.forName isn't going through the CLL anyway?
I did some preliminary reading of the OpenJDK code and was under the
impression that both code paths would end up in the same class loading
machinery. (I can't verify that easily, I'm currently busy with non-Java
stuff.)