SnakeYAML seems to forget Directives per document?

377 views
Skip to first unread message

Robin Miller

unread,
Jun 5, 2012, 6:51:33 PM6/5/12
to SnakeYAML
I'm new to both YAML and SnakeYAML, so please let me know if I'm
making any hideous blunders. I've got a bit of a weird one, at least I
think it is.

###############
Context
###############
I'm trying to load Unity3D game engine files, which can be saved in
YAML, for a research project I'm working on. I don't have access to
the Unity3D source, so I don't know what they're using to write these
files. I can't change their output, but of course I could 'fix' the
files as a workaround.

I'll get to the file format specifics in a second, but first I'll
describe my overall goal. Our project only cares about some of the
file contents, so I was intending on loading everything with a simple
Constructor. In it, I was thinking I'd make a few Construct classes
that deal with the specific stuff we care about and a GenericConstruct
that just handles Nodes as one of the simple scalar/map/list types.

###################
Expected File Format
###################
The file format is outlined on the Unity3d website (if you care,
http://unity3d.com/support/documentation/Manual/FormatDescription.html),
but the short version is thus:

1. Each game object component in a Unity Scene is saved as a separate
YAML Document. One YAML file per Scene, so multiple documents per
stream. There is minimal nesting (I can't find anything more than a
single nesting level).

2. Those files start with the directives:
%YAML 1.1
%TAG !u! tag:unity3d.com,2011:

3. Game Object Documents start with:
--- !u!<object_type_num> <anchor>

For example:

--- !u!29 &1

or:

--- !u!157 &4

Where 29 is the constant for a "Scene" component type, 157 for a
"LightmapSettings" component. My understanding is that these should
always expand to "tag:unity3d.com,2011:29" and "tag:unity3d.com,
2011:157" for the examples above.

#################
The Problem
#################

So now that's out of the way, here's my problem.

When I call yaml.loadAll(...), it's fine for the first Document in the
stream. On the next one, I get a ParserException that complains that
it "found undefined tag handle !u!". Upon searching variables, it is
true that !u! disappears from the tagHandles instance variable in
org.yaml.snakeyaml.parser.ParserImpl. This looks to be because
processDirectives() is called every document and it trashes the old
tagHandles list.

From my reading of the YAML spec (http://yaml.org/spec/1.1/#id898785),
it's supposed to preserve directives between documents:

"To ease the task of concatenating character streams, following
documents may begin with a byte order mark and comments, though the
same character encoding must be used through the stream. Each
following document must be explicit (begin with a document start
marker). If the document specifies no directives, it is parsed using
the same settings as the previous document. If the document does
specify any directives, all directives of previous documents, if any,
are ignored."

Since the file I'm parsing contains only explicit Documents and the
only directives appear at the the top of the file before the first
Document, should it not carry forward to the next Document?

Since I'm still new to the format and the library, I hope that I'm
just Doing It Wrong, but I figured I'd ask people who know what's up.
So... any thoughts? Is this an esoteric bug or is there some setting
that I'm missing? Is my implementation strategy of having a generic
construct a bad idea? Should I start an issue ticket? Is there a
recommended workaround?

PS. Let me know if you want any more information about anything. I was
trying to keep this short, but still provide as much info as needed.

Robin Miller

unread,
Jun 5, 2012, 6:52:28 PM6/5/12
to SnakeYAML
Oh, and this is SnakeYAML 1.10.

On Jun 5, 4:51 pm, Robin Miller <robinetmil...@gmail.com> wrote:
> I'm new to both YAML and SnakeYAML, so please let me know if I'm
> making any hideous blunders. I've got a bit of a weird one, at least I
> think it is.
>
> ###############
> Context
> ###############
> I'm trying to load Unity3D game engine files, which can be saved in
> YAML, for a research project I'm working on. I don't have access to
> the Unity3D source, so I don't know what they're using to write these
> files. I can't change their output, but of course I could 'fix' the
> files as a workaround.
>
> I'll get to the file format specifics in a second, but first I'll
> describe my overall goal. Our project only cares about some of the
> file contents, so I was intending on loading everything with a simple
> Constructor. In it, I was thinking I'd make a few Construct classes
> that deal with the specific stuff we care about and a GenericConstruct
> that just handles Nodes as one of the simple scalar/map/list types.
>
> ###################
> Expected File Format
> ###################
> The file format is outlined on the Unity3d website (if you care,http://unity3d.com/support/documentation/Manual/FormatDescription.html),

Jordan Angold

unread,
Jun 6, 2012, 1:51:52 AM6/6/12
to SnakeYAML
Hi Robin,

After looking through your links, it looks like you've found a bug in
SnakeYAML (congratulations!). You don't appear to be doing anything
wrong.

Please file an issue on the issue tracker: http://code.google.com/p/snakeyaml/issues/list

Please include:
- a condensed description
- an example YAML stream which triggers the bug
- the code you're using to load it, even if it's just "new
Yaml().loadAll ( stream )"

To answer the questions at the bottom:

1. I believe you're reading the specification correctly. Directives
should be retained until a "new" set of directives is given.

2. You aren't doing anything obviously wrong. It's pretty clear that
Yaml.loadAll() creates a single ParserImpl, which discards its tags on
DocumentStartEvent (incorrectly).

3. I don't know what you mean by GenericConstruct. SnakeYAML can
handle constructing all the default types (String, ints, etc) and even
complex Java objects, as long as they have standardized setter /
getter methods: property foo is retrieved with getFoo() and set with
setFoo(). You can tell SnakeYAML to access the property itself, if you
don't want to write setter / getter code...even if the variables are
private or final or both. Look at BeanAccess.FIELD:
http://snakeyamlrepo.appspot.com/releases/1.10/site/apidocs/index.html?org/yaml/snakeyaml/introspector/BeanAccess.html

You should only have to write Construct instances if you want very
very specific things.

I believe there's a way to resolve from tags like !u!29 to class names
by adding Tag instances that describe the (tag name) <--> (class name)
mapping. However, I can't find the documentation right now ( :( ).

4. Yes, issue.

5. No, there doesn't appear to be any usable workaround for this.

/Jordan

Andrey

unread,
Jun 6, 2012, 5:49:15 AM6/6/12
to snakeya...@googlegroups.com
Dear Robin,
I hope that issue is clear now. I have nothing to add. Let us fix it.

Just a small question. What is parser which was used to create the original YAML document ? Do you may be know the programming language ?
If it is fixed properly in SnakeYAML, it will work in  Java and JRuby (and other JVM languages), but it may unpredictable fail in other languages (Python, Ruby, Perl etc). In the meantime I will try to check how it is done in Python.

-
Andrey



Robin Miller

unread,
Jun 7, 2012, 12:20:04 PM6/7/12
to SnakeYAML
Thanks for the help, guys! This relieves a bunch of my confusion. I've
made an issue ticket (http://code.google.com/p/snakeyaml/issues/detail?
id=149). I'll try a workaround. If it works I'll post it in the ticket
just in case someone else has the same issue.

Jordan:

In my code sketch attached to the ticket, I outlined what I'm doing to
get SnakeYAML to be happy with the tags. It might be unnecessary, but
my reading of issues 31 and 39 (http://code.google.com/p/snakeyaml/
issues/detail?id=31, http://code.google.com/p/snakeyaml/issues/detail?id=39)
tells me that SnakeYAML won't accept unknown tags without a custom
constructor. If it's possible to tell SnakeYAML to register a tag to
be ignored as special and to load it as the appropriate map/list/
scalar simple type, I'd love to know.

Andrey:

I wish I knew for sure. We're a university research project external
to the Unity company, so we don't know what they do internally.
However, a quick search (http://answers.unity3d.com/questions/9675/is-
unity-engine-written-in-monoc-or-c.html) suggests that it's either a C+
+ or C# library.

Whatever it is they use, I realize that it's not SnakeYAML's
responsibility to support their output. My main concern with this
problem was that SnakeYAML was a smidge off-spec, which is unfortunate
since it seems like a lot of work has gone into it.

Andrey Somov

unread,
Jun 8, 2012, 7:40:33 AM6/8/12
to snakeya...@googlegroups.com
On Thu, Jun 7, 2012 at 6:20 PM, Robin Miller <robine...@gmail.com> wrote:
If it's possible to tell SnakeYAML to register a tag to
be ignored as special and to load it as the appropriate map/list/
scalar simple type, I'd love to know.

 
The definition and the implementation of such a feature would be a mega task...

 

Whatever it is they use, I realize that it's not SnakeYAML's
responsibility to support their output. My main concern with this
problem was that SnakeYAML was a smidge off-spec, which is unfortunate
since it seems like a lot of work has gone into it.

 
With help of the community we can gradually reduce the amount of deviations to a bearable minimum :)

Andrey

Jordan Angold

unread,
Jun 8, 2012, 11:57:06 PM6/8/12
to SnakeYAML
You're correct that SnakeYAML will not accept input containing
unrecognized tags.

Andrey is also correct that it is not possible to tell SnakeYAML to
ignore some tags when determining how to construct objects. The code
change would be complicated.

It may be possible to work around this deficiency. If you know the
type of object you would like each tag to be loaded as, you can
replace the tags yourself prior to construction. For example, if you
know all the !!foo tags should be constructed as Maps and all !!bar
should be Sets, then the following works. If you have !!foo for
fundamentally different data types, you cannot use this.

First, a custom extension of Constructor, overriding
getConstructor(Node) to replace Tags on the fly:
class TagReplacingConstructor extends Constructor {
private Map<Tag,Tag> replacements = new HashMap<>();

public TagReplacingConstructor() {
// You will want to add your own replacements here
replacements.put(new Tag(Tag.PREFIX + "!!foo"), Tag.MAP);
replacements.put(new Tag(Tag.PREFIX + "!!bar"), Tag.SET);
}

@Override
protected Construct getConstructor(Node node) {
// Replace some, but not all, tags.
Tag replaceTag = replacements.get ( node.getTag() );
if ( replaceTag != null ) {
node.setTag(replaceTag);
}

// Delegate to the default implementation now that we've shuffled
Tags around
return super.getConstructor(node);
}
}

You can then extend this yourself and add new (old tag) -> (new tag)
mappings yourself, to handle all the nodes.

You could also use this to replace tags like "tag:unity3d.com,2011:29"
with a tag that tells SnakeYAML to construct those objects as your
own. For example:

replacements.add (
new Tag("tag:unity3d.com,2011:29"),
new Tag(Tag.PREFIX + "com.foo.bar.MyImplementationClass" )
);

Hope that helps,
/Jordan

On Jun 7, 12:20 pm, Robin Miller <robinetmil...@gmail.com> wrote:
> Thanks for the help, guys! This relieves a bunch of my confusion. I've
> made an issue ticket (http://code.google.com/p/snakeyaml/issues/detail?
> id=149). I'll try a workaround. If it works I'll post it in the ticket
> just in case someone else has the same issue.
>
> Jordan:
>
> In my code sketch attached to the ticket, I outlined what I'm doing to
> get SnakeYAML to be happy with the tags. It might be unnecessary, but
> my reading of issues 31 and 39 (http://code.google.com/p/snakeyaml/
> issues/detail?id=31,http://code.google.com/p/snakeyaml/issues/detail?id=39)

Robin Miller

unread,
Jun 13, 2012, 5:24:25 PM6/13/12
to SnakeYAML
This is effectively what I was doing, but instead of 'fixing' the tag,
I just registered all of them and then did the separation myself,
based on the return value of getNodeID().

Turns out that I need to use the low-level API, since I need to
preserve the anchors (Unity uses them to store their unique object
ids) as well as handle this tag business.

Andrey

unread,
Jun 14, 2012, 4:58:20 AM6/14/12
to snakeya...@googlegroups.com
Hi Robin,
does the fix solve your problem ? May I close the issue 149?
(http://code.google.com/p/snakeyaml/source/detail?r=b772d9bb95f181f561a7a77ff562eae8ea7cbe9b)

P.S. Can you please share your experience with SnakeYAML ?
What is difficult to understand ? Where can we improve the documentation ?
What is unclear in the API ? Have you tried other YAML parsers ?

Andrey

Robin Miller

unread,
Jun 18, 2012, 3:15:37 PM6/18/12
to SnakeYAML
Sorry for the late reply. I've had to switch to a higher priority
topic and I'll probably be on that for a while. I'll see if I can get
it tested for me, but from what I can see in the test it looks like if
the test passes, my code should.

I'll also get you some proper feedback soon, now that my crazy weekend
is over.

On Jun 14, 2:58 am, Andrey <py4...@gmail.com> wrote:
> Hi Robin,
> does the fix solve your problem ? May I close the issue 149?
> (http://code.google.com/p/snakeyaml/source/detail?r=b772d9bb95f181f561...)

Robin Miller

unread,
Jun 21, 2012, 3:58:18 PM6/21/12
to snakeya...@googlegroups.com
I still haven't had a chance to test this bug, but I'll answer your other questions.

This is the first and only YAML parsing library I've used. I chose it based off of the reputation for having the best thoroughness (the last thing I want to deal with is an unreliable library) and the recommendations off StackOverflow and other sites. That said, it's not like it's a huge market, JYAML is unmaintained, and YAMLBeans is bean-oriented, which is not what I'm really looking for.

I didn't find SnakeYAML to me as easy to use as xStream for XML. The nice thing that you do there is register XML tags for a particular type, and then build a converter class to do both I/O directions. It was simple to use, but I suspect that I'd have a similar problem as here since my use case for SnakeYAML is far more esoteric than xStream. I know that people have requested that unknown tags be ignorable and that it's been deemed very difficult to accomplish, but in my case it would be helpful. If it could be forced to ignore tags and just fall back to the standard map/list/scalar paradigm, and even if it had to just choose to force everything to read as strings, that would be better in my opinion. It's almost what I'm doing anyways with my custom constructor.

The structure of the code doesn't make it obvious what's there for internal use and what I'm supposed to use. For example, I can't figure out what's the "right" way to register tags, and I can't seem to give it a Constructor except in the Yaml(...) constructors. I end up feeling like I'm supposed to understand how it works on the inside so that I can customize it, rather than being able to make a Yaml object and modify it after the fact; like fiddling with a toy. I didn't like that I had to make a Constructor and then a Construct extension. I first read Construct as a verb, which flies in the face of the nouns-only class name convention. I suggest renaming it to something more appropriate to what is actually does, like ConstructorHelper (ok, that's sort of a terrible name, but to me it at least give me more intuition than "Construct"). I still don't really know what the difference is, or why these two classes exist.

The documentation was probably the worst part of the experience of the library. I often ended up having to look at the code to figure out what I was supposed to do, and unfortunately, the Javadoc in those classes is pretty sparse. The ideal library teaches me how to use it through the API, then the documentation. I don't want to spend time figuring out how a library works because that sort of defeats the purpose of using one. My suggestions for fixing it would be to have some full examples, not just short test cases with little commenting. That's fine for a test case, but it doens't teach me much.  I can see that some classes and methods exist, and that this test is supposed to pass, but it doesn't tell me what, at a high level, the test is supposed to represent. Plus, it's a lot of work to parse the code and figure out what's important when I could just be told what to watch. When and why do I need to make a Constructor? Resolver? Dumper? What's the difference between these concepts? What about a Construct? What's that? Take a look at how xStream wrote their tutorials for what I consider to be a good overview. They outline a particular use case, give an example document, and then write an implementation for it.

Reading that over again, it seems really vague, so if you want more clarification please let me know. I'm all for helping, and I know that documentation and teaching is just as hard and time-consuming as writing the code itself. I really appreciate that the library exists, and it has proven to be fairly flexible, so I find it a shame that it's big downfall is the ancillary materials.

Andrey

unread,
Jun 22, 2012, 5:09:05 AM6/22/12
to snakeya...@googlegroups.com
Thank you for your time. This is really useful.

I think the main issue here is no clear separations of concerns. May be we should introduce different APIs for different business cases:
- JavaBean <-> YAML (when a JavaBean maps one-to-one the a YAML document)
- dynamic Java structure (when a YAML document comes from another programming language)
- configuration files
etc

We also need much more powerful tag management. In fact, every scalar needs 3 tags - implicit, explicit, runtime (set in the code or detected from the Class). All 3 must be available when an instance is created. This is required to "ignore" an explicit tag. Because by "ignoring" users always want to choose implicit tag instead of explicit one.

I think when we are going to implement the next YAML specification (1.2, JSON compatibility), we can break the backwards compatibility and try to deliver a better public API.

One of the developers already wanted to introduce a breaking API change, but we need this movement to be supported by many users/developers.

Cheers,
-
Andrey

Jordan Angold

unread,
Jun 22, 2012, 2:05:26 PM6/22/12
to snakeya...@googlegroups.com
Thanks for taking the time to write out your concerns -- it can be hard for us to see those as library developers.

We should probably start gathering requirements and specifications for SnakeYaml 2.0 somewhere (presumably not a forum). An initial list, based on this thread:

Documentation
- better public-API JavaDoc, preferably with an HTML-based introduction and explanation (in addition to the FAQ-style pages already present).
- description of the Read -> Scan -> Parse -> Compose -> Construct and Represent -> Emit -> Serialize pathways
- better internal JavaDoc
- example usage for a few different common use-cases (loading configuration files, sending messages over a stream, communicating with other prominent YAML implementations)

API
- support for more customization: in construction, in SY behaviour (rather than assuming things, provide options)
- provide as much information about the underlying YAML document as possible
- clearly define which classes are user-extensible / modifiable / etc, and which classes should not be modified
- customization by dependency injection rather than extension where possible (this is the suggestion I made: instead of extending Constructor, tell Constructor how to construct things by providing Construct instances)
- making the naming convention clear; much of it is based on the YAML specification, but most users haven't read that (and shouldn't have to)

Andrey, where should we start accumulating requirements for SnakeYaml 2.0?

/Jordan

Andrey

unread,
Jun 23, 2012, 5:42:51 AM6/23/12
to snakeya...@googlegroups.com
The Wiki ?

Can you may be lead the process ? 

-
Andrey


On Friday, June 22, 2012 8:05:26 PM UTC+2, Jordan Angold wrote:
T

Jordan Angold

unread,
Jun 23, 2012, 2:56:51 PM6/23/12
to snakeya...@googlegroups.com
I've written up an initial set here: http://code.google.com/p/snakeyaml/wiki/SnakeYaml_2_Planning?ts=1340477745&updated=SnakeYaml_2_Planning

I don't have much time this week, but I hope to review some of the outstanding Issues soon to translate any reasonable requests into API goals.

/Jordan
Reply all
Reply to author
Forward
0 new messages