Using the merge (<<) type with custom tags...

354 views
Skip to first unread message

DemonWasp

unread,
Jan 6, 2011, 5:23:36 PM1/6/11
to SnakeYAML
Let's say I have a simple class:

public class Data {
protected int a,b;
// forgive me for omitting the setter/getters here
}

I can make a document that looks like this:

---
- !!Data
&id001
a: 1
b: 1
- *id00

Which yields the expected:
Data ( 1, 1 )
Data ( 1, 1 )



HOWEVER, there doesn't seem to be any way to use the merge construct.


ATTEMPT 1:
---
- !!Data
&id001
a: 1
b: 1
- <<: *id00

Produces: ClassCastException: java.util.LinkedHashMap cannot be cast
to Data


ATTEMPT 2:
---
- !!Data
&id001
a: 1
b: 1
- !!Data
<<: *id001

Produces: Cannot create property=<< for JavaBean=Data (0,0); Unable to
find property '<<' on class: Data


After nearly an hour of exhaustive Googling, I can't find anything in
snakeyaml documentation or in yaml.org documentation that mentions
"merge" and "tag" in the same paragraph, let alone sentence or code
sample.

Is this possible in SnakeYaml, and if not, are there any plans to
implement it?




Andrey

unread,
Jan 7, 2011, 6:03:37 AM1/7/11
to SnakeYAML
Hello,
it is not clear what you actually wish to achieve with the 'merge'
tag.
Do you expect to create 2 independent instances with the same data ?
(a JUnit test is often the best way to describe the requirements)
If the goal is create 2 independent instances using aliases and the
merge tag then may I kindly ask you to create an issue ? At the moment
(version 1.7) SnakeYAML does not support it. SnakeYAML tries to create
a map instead of a JavaBean for the second instance.
(it looks like a bug)

-
Andrey

DemonWasp

unread,
Jan 7, 2011, 9:35:15 AM1/7/11
to SnakeYAML
The ideal use is that the original properties are applied to a second
instance of the Data class, which may then be further modified. An
example of my intent is below; I will submit an issue soon.

Input:
---
- !!Data
&id001
a: 1
b: 1
- !!Data
<<: *id001
b: 3

Output:
---
Data(1,1)
Data(1,3)

That is, two separate Data objects. The second has the merged
properties of the first, overridden by subsequent statements in the
mapping.

Andrey Somov

unread,
Jan 7, 2011, 10:21:03 AM1/7/11
to snakeya...@googlegroups.com
I see it now.

---
- !!Data
&id001
a: 1
b: 1

- !!merge
<<: *id001
b: 3
...

The tag must be either !!merge or it must not be present. Otherwise it
makes it possible to specify something inconsistent:


---
- !!Data
&id001
a: 1
b: 1

- !!SomethingElse
<<: *id001
b: 3
...

DemonWasp

unread,
Jan 8, 2011, 3:23:46 PM1/8/11
to SnakeYAML
Okay, but is the following not a valid use-case:

- !!Data
&id001
a: 1
b: 1
- !!DerivedData
<<: *id001
b: 3
c: 1

Where we assume that the declaration is "public class DerivedData
extends Data".

Even that is a moot point then, because if I have two separate classes
that have the same properties, then I expect to be able to specify the
properties in the same way in the YAML file. Since I can specify them
the same way in YAML, I expect to be able to use the merge operator.
The merge operator conceptually works on the mapping provided, not on
the Data object produced from that mapping.

Example:

public class Named {
property String name; // I wish this syntax existed
}

public class Person { // doesn't extend Named because that doesn't
make sense
property String name;
property int age;
}

---
named:
&id001
name: David
person:
<< : *id001
age: 12
...

This is a contrived example, but I don't see anything that prohibits
this type of use-case.

If there's a good reason for neither of the above to work
( "subclasses allowed", "similar mappings allowed" ) then please
educate me.

Andrey Somov

unread,
Jan 10, 2011, 5:47:07 AM1/10/11
to snakeya...@googlegroups.com
YAML is not only to support Java but all the other languages.
If you look at your proposal you will see that if a language allows
operator definition (Scala for instance) then it looks like the
definition for method '<<'
To avoid it in the specification they define that the tag for '<<'
must be !!merge. It may be implicit (like !!str) and you do not have
to use but it is there.
In the proposal you introduce another tag and thus change the meaning of '<<'.

It does not mean it cannot and should not be implemented in SnakeYAML.
But we must understand the consequences that other parsers may fail to
parse the document.

By the way in the second example I do not understand how do you define
in the YAML document that 'person' has the Person class. Imagine there
is another child on Named - Animal. How do you know it is not the
elephant David who is 12 years old ?

-
Andrey

P.S. I do not have time now to work on the issue. Feel free to try. We
will be glad to support your journey :)

DemonWasp

unread,
Jan 10, 2011, 11:26:29 AM1/10/11
to SnakeYAML
Hmm...that's a reasonable reason, I guess, if the primary use of YAML
is for communication between two programs. Mine isn't, it's for
configuration for a single program, written entirely in Java. It just
seems very unfortunate that you can't use merge on anything other than
a simple map.

I forgot to put tags in the YAML document I described. A corrected
version:

---
named: !!Named
&id001
name: David
person: !!Person
<< : *id001
age: 12
animal: !!Animal
<< : *id001
age: 7
...

Then we have a Named thing which has name David, a Person thing which
is also named David, with age 12, and an Animal thing, with name David
and age 7. Sensibly, you would have "Person extends Named" and "Animal
extends Named", but I don't see that as necessary for such a feature.

I'd implement this, but it's easier to just hack around it for the
moment. If I find myself with time to spend in the future, maybe I'll
spend some time on fixing this silly issue.

DemonWasp

unread,
Jan 10, 2011, 5:55:01 PM1/10/11
to SnakeYAML
Apparently I had more time on my hands than I thought. Warning: long,
detailed post ahead. I'll summarize in 2 points:

1. I have a small change to SnakeYaml that lets MappingNodes
representing Java classes be merged, correctly.
2. This change can be implemented as an optional extension to
SnakeYaml; SnakeYaml remains standards-compliant, with an option to
break spec for sake of usability.


Still reading? Alright then:


I now have a slightly hackish solution that allows for the feature I
requested, as a mostly-extension to SnakeYaml's existing
functionality. In fact, if I were to change existing SnakeYaml code
(rather than extending it), the changes would be very minor indeed.

With my changes, the following file (which does NOT conform to YAML
specifications):
---
- !!Data
&id001
a: 1
b: 1
- !!DataChild
<<: *id001
b: 3
c: 5
- !!DataUnrelated
<<: *id001
d: 7
...

Represents three objects:
1) A Data object, with values a=1, b=1.
2) A DataChild object, which extends Data to add a third field, c.
Values a=1 (merged), b=3 (overridden), c=5 (new).
3) A DataUnrelated object, which does NOT extend Data. Values are a=1
(merged), b=1 (merged), d=7 (new).

The output from parsing this and then printing the items in the
resulting list one-by-one is:
[output-start]
WARN: The specified merge source class Data is not an ancestor of the
merge-result class DataUnrelated
Data (a=1,b=1)
DataChild (a=1,b=3,c=5)
DataUnrelated (a=1,b=1,d=7)
[output-end]


The warning is emitted in lieu of a warning mechanism. This could,
depending on settings, be either ignored or the cause of an error.


The biggest requirement is that immediately before an object is
constructed, the mapping must be flattened, by
flattenMapping(MappingNode) in SafeConstructor.java . With just that
simple change, this issue is about 50% done.

My current implementation "hijacks" the Constructor class by
overriding the getClassForNode(Node) method:

@Override
protected Class<?> getClassForNode ( Node node ) {
flattenMapping ( (MappingNode)node );
return super.getClassForNode ( node );
}

This ends up being quite ugly in code because the
flattenMapping(MappingNode) method is private. The result is a
butchered copy-paste version of the method, obviously not an ideal
solution.

One important modification made to the flattenMapping(MappingNode)
method is to insert the code that issues the warning described above:

case mapping:
MappingNode mn = (MappingNode) valueNode;

// INSERTED WARNING-GEN CODE...
Class nodeClass = getClassForNode(node);
Class mergeSourceClass = getClassForNode(mn);
if ( !mergeSourceClass.isAssignableFrom ( nodeClass ) ) {
System.out.println ( "WARN: The specified merge source
"+mergeSourceClass.getCanonicalName()+" is not an ancestor of the
merge-result class "+nodeClass.getCanonicalName() );
}
// END INSERTED

flattenMapping(mn);
merge.addAll(mn.getValue());
break;
...


This is, at its heart, a literal one-line change. This change makes
use of all of SnakeYaml's existing functionality, implementing almost
nothing new. The only change is that instead of banning merging of
class objects, it assumes merging of class objects. I don't see any
security issues coming out of this, though maybe I'm just uncreative.
This also works just great with multiple-merges, as per (http://
yaml.org/type/merge.html).



Now, there are a lot of ways we could proceed with this knowledge.

First, this could be completely ignored. After all, the SnakeYaml team
undoubtedly places a lot of importance on being standards-compliant.
This (optional) extension breaks the YAML specification by allowing
merged nodes which do not have the !!merge tag. If this is your
decision, I will look to fork SnakeYaml for my own use.

Second, this could be facilitated, if not endorsed, by the SnakeYaml
team. A small code update would make it simpler to add this
functionality by exposing flattenMapping() as described above, either
with or without the warning code mentioned (or, with some kind of
boolean switch involved). This option seems the least helpful to
everyone involved.

Third, this could be added as an optional extension to SnakeYaml.
Explicitly marked as "BREAKS SPECIFICATION, DANGER DANGER", this seems
like a functionality that could prove helpful to the Java-only
community. This functionality would probably be implemented by an
extension of Constructor, with the caveat that flattenMapping() would
still need to be protected, not private. Alternately, this could be
given as an option to Constructor itself, avoiding any public-API
changes.


Thanks,
Jordan Angold

Andrey Somov

unread,
Jan 11, 2011, 5:11:12 AM1/11/11
to snakeya...@googlegroups.com
Thank you very much for your time and efforts.

1) Your proposal does not violate the specification. But it gives
uncertainty to some languages

2) SnakeYAML tries to be as close as possible to the spec but it has
exceptions (http://code.google.com/p/snakeyaml/wiki/Documentation#Deviations_from_the_specification)

3) we are open to discuss and include any change

4) please do not forget to include such a detailed description to the
issue to have all information in one place (it is easier to follow the
topic)

5) your change can be implemented as the standard behavior or as a
feature which is enabled by explicit setting.

6) you current proposal is not a hack. The fact that the method is
private does not mean it cannot be made protected to support
development. It was made private to avoid confusion when the class is
extended

7) Feel free to create the clone
(http://code.google.com/p/snakeyaml/source/clones) and we can see and
discuss the details.

This change can become a part of the coming soon 1.8 release.

-
Andrey

Andrey

unread,
Jan 11, 2011, 7:28:30 AM1/11/11
to SnakeYAML
Please also consider such a document:

---
- !!Data
&id001
a: 1
b: 1
- !!DataChild
<<: *id001
b: 3
c: 5
- !!DataUnrelated &id002
<<: *id001
d: 7
- # what is the expected class of this object ? Data, DataUnrelated,
HashMap ?
<<: [ *id001, *id002 ]
b: 9
...


DemonWasp

unread,
Jan 11, 2011, 11:53:15 AM1/11/11
to SnakeYAML
1) I thought my proposal violated the specification, as the tag of
these merged nodes is no longer !!merge, but !!Data or similar.
Correct me if I'm wrong.
2) SnakeYaml seems to vary from spec only where it is expedient. What
I'm asking for here is a little larger, possibly too large to be
completely reasonable to a specifications-oriented library.
3-7) I'm glad to see you're so open to modification and extension.

Re: document:

Given as we're already deviating wildly from the YAML specification
here, I would say that the expected output of the last entry is an
error stating that there is no class defined for that MappingNode, or
alternately an error or warning stating that the type determined for
that node (which would default to some subclass of Map<String,Object>)
cannot be assigned from either Data or DataUnrelated.

If that issues a warning, then I would expect the fourth object to be
a subclass of Map<String,Object> with entries {"a"=1,"b"=9,"d"=7}. In
the absence of a declared type, a YAML mapping should default to be a
Java Map.


I will work on a clone of SnakeYaml soon and will keep you apprised of
my progress. Please feel free to keep asking questions like the above
document: anything that makes me think about what I'm asking is
helpful. As I did deeper here, I'm noticing I may want to make other
modifications to my clone, as I also need to support constructor
arguments, without using the compact notation. My first goal, however,
will be an intelligent implementation of the feature discussed in this
thread, with an eye to possible inclusion in 1.8.


/Jordan

Andrey Somov

unread,
Jan 11, 2011, 12:03:39 PM1/11/11
to snakeya...@googlegroups.com

DemonWasp

unread,
Jan 11, 2011, 1:48:29 PM1/11/11
to SnakeYAML
Good update, very helpful. See my updates here:
http://code.google.com/r/jordanangold-beans-merge/source/detail?r=a6c2fa515dd9265d7c0c2bf0b806532e79f6b27b

With this change, SnakeYaml's default behaviour is unchanged from
before my feature request. However, you can now create a Constructor
object in one of three modes:

1. Strict is the default and will adhere to the YAML specification:
merged objects must be mappings.
2. Standard is the most likely use-case and allows merging from any
type that we can be assigned to (any ancestor class). Under this mode,
the definition of the DataUnrelated node (in the most recently-posted
YAML document) should produce an exception, as should the definition
of the non-tagged node.
3. Loose disables type checking, allowing both the DataUnrelated and
non-tagged nodes.

This change modifies flattenMapping to take a checkTypes boolean
parameter which selects whether types are checked or not. If they are
checked, they cause a parsing error; if they are not, they are
completely ignored. This produces some odd-looking code in
SafeConstructor, which only ever calls flattenMapping ( mapping,
false ).

/Jordan


On Jan 11, 12:03 pm, Andrey Somov <py4...@gmail.com> wrote:
> You can have a look at what has been added lately:http://code.google.com/p/snakeyaml/source/detail?r=41c534a546be82481c...
>
> -
> Andrey

DemonWasp

unread,
Jan 11, 2011, 3:15:20 PM1/11/11
to SnakeYAML
Once again, I have the message "always code to tests" drilled into me
by my failure to do so. I've updated the tests, and corresponding to
the new tests, I have fixed a copy-paste bug in the type-checking
code.

Change listed here:
http://code.google.com/r/jordanangold-beans-merge/source/detail?r=af17b459f4299a4b37e4eba25f828626beba308e

/Jordan


On Jan 11, 1:48 pm, DemonWasp <jordanang...@gmail.com> wrote:
> Good update, very helpful. See my updates here:http://code.google.com/r/jordanangold-beans-merge/source/detail?r=a6c...

maslovalex

unread,
Jan 11, 2011, 4:49:35 PM1/11/11
to SnakeYAML
> 1) I thought my proposal violated the specification, as the tag of
> these merged nodes is no longer !!merge, but !!Data or similar.
> Correct me if I'm wrong.

I think it is a bit different. In this case parser creates 2 nodes:
1st is Mapping with the tag !!Data and
2nd is Scalar with the tag !!merge
the flattenMapping(...) make the magic ;)

It happens that we forgot to merge on beans actually and IMHO it is a
bug :)

I also think explicit tag is not always needed for merge to work
correctly on the beans.
The main thing here is the possibility to determine what kind of
object MappingNode represents.
For example if merging appears not in the list (like in present
tests), but as property value for some bean where we can determine the
type. (couldn't say it clear.... so maybe some examples are needed) :)

Check the latest source - they may be doing what you need already.
Right now if no explicit tag present SnakeYAML creates standard
JavaMap (see the updated test for the issue100)

-alex

Andrey

unread,
Jan 12, 2011, 10:05:52 AM1/12/11
to SnakeYAML
I face a lot of test failures in your clone.
(try 'mvn test')

DemonWasp

unread,
Jan 12, 2011, 11:38:21 AM1/12/11
to SnakeYAML
I found the issue: I stupidly broke the Constructor<Class> constructor
by having it not pass its arguments. Changes:

Constructor fix: http://code.google.com/r/jordanangold-beans-merge/source/detail?r=d0082df809f4e751b1d76514aff592687acf29da
Tests fix: http://code.google.com/r/jordanangold-beans-merge/source/detail?r=17c96e3b68b02ebf0567f5388e33eed74f49023a

After correcting some of the tests in MergeJavaBeanTest, I see 720
passing tests and 0 failing.

DemonWasp

unread,
Jan 12, 2011, 11:49:21 AM1/12/11
to SnakeYAML
Thanks for the correction. If this is classified as a bug, then it
seems like a no-brainer that this should be included in SnakeYaml core
in the next version. My question is whether this syntax is supported
by other YAML parsers (specifically, by the Python version of
SnakeYaml) and whether it implies the same thing as it does under this
extension.

It is possible that some kind of type inference should be done, based
on either property types (dubious when properties deal with interfaces
or abstract classes...), or on the merged type in the case of a single
merge. You could rule that an implicitly-typed MappingNode that merges
from a single other MappingNode will default to the type of the merge-
source MappingNode. You could even extend that to the case where ALL
of the merge-source MappingNodes are the same type.

Examples and tests would be required.

Andrey

unread,
Jan 13, 2011, 7:01:05 AM1/13/11
to snakeya...@googlegroups.com
Thank you Alex. I was confused. Here is an example which also works:

- !!map # Merge one map
  !!merge foo: *CENTER
  r: 10
  label: center/big

The scalar 'foo' is ignored because explicit !!merge tag is used.

-
Andrey

Andrey

unread,
Jan 14, 2011, 3:23:38 AM1/14/11
to snakeya...@googlegroups.com
I think no other parser is supporting merging to instances. PyYAML does not support it now.

Now we consider the issue to be fixed. I will close it.

May I kindly ask you to write some documentation/description which we can put to the documentation wiki ?
You can use the examples from the tests.
It would greatly help others to understand how to use the feature properly.

-
Andrey

DemonWasp

unread,
Jan 15, 2011, 1:16:26 AM1/15/11
to SnakeYAML
I haven't found any other parsers that support merging, at least none
written for Java. I will be using my clone of SnakeYaml for further
experimentation on unrelated features.

I support closing the issue as fixed.

I'll happily write some documentation. What format would you like me
to provide documentation in? I can't seem to figure out how to edit
the Wiki page.

/Jordan

Andrey Somov

unread,
Jan 15, 2011, 5:44:09 AM1/15/11
to snakeya...@googlegroups.com
You have been included into the project as a contributor. Feel free to add/edit the documentation.

I would recommend to drop your existing clone and start a new clone for each experiment. Then clones do not contain features not related to the experiment and it is much easier to include the code without any additional amendments. That is what distributed version control systems are all about.

-
Andrey

DemonWasp

unread,
Jan 15, 2011, 12:58:44 PM1/15/11
to snakeya...@googlegroups.com
Expect to see documentation soon. I'm not sure exactly when I'll have time, but it should be some time this weekend. I will explicitly mark the feature as being available in version 1.8+

I'll also follow your suggestion about cloning. This is the first time I've ever used Hg (not to mention Maven), so I'm a little new to the ideas; I've mostly been SVN / Eclipse so far.

Thanks for being so responsive to this issue, and I'm glad to see it included in SnakeYaml.
/Jordan

Andrey

unread,
Jan 18, 2011, 7:16:12 AM1/18/11
to snakeya...@googlegroups.com
Thank you for the provided documentation. The problem you found with re-setting values is solved by Alex. You can see it here: 

Can you please test that it works properly now ? Then we can close issue 103 and you can update the documentation.


P.S. It is very easy to use Eclipse. Just run 'mvn eclipse:eclipse' in the clone and import the project to Eclipse. I have always a few clones in the same workspace with different names. You can pull the incoming changes from the master to keep them up-to-date (then you do not need to merge your changes)

-
Andrey

DemonWasp

unread,
Jan 18, 2011, 12:25:52 PM1/18/11
to SnakeYAML
With latest updates, the merge appears to be done correctly. My test
uses the following class and YAML file, if you would like to duplicate
it:

Server.java:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.ServerSocket;
import java.util.List;

import org.yaml.snakeyaml.Yaml;


public class Server {

protected ServerSocket socket;
protected int port;
protected String name, contentRoot;


@SuppressWarnings("unchecked")
public static List<Server> load ( File file ) throws IOException {
InputStream in = new FileInputStream ( file );

Yaml yaml = new Yaml();

List<Server> servers = (List<Server>)yaml.load( in );

in.close();

return servers;
}

public static void main ( String... args ) throws IOException {
List<Server> servers = load ( new File ( "servers.yaml" ) );
System.out.println ( servers );
}


@Override
public String toString() {
return "Server ('"+name+"' serves '"+contentRoot+"' on port
"+getPort()+")";
}


public int getPort() {
return socket.getLocalPort();
}
public void setPort(int port) throws IOException {
this.socket = new ServerSocket ( port );
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getContentRoot() {
return contentRoot;
}
public void setContentRoot(String contentRoot) {
this.contentRoot = contentRoot;
}
}


servers.yaml:
---
- !!Server
&http-server
name: HTTP Server
port: 80
contentRoot: /home/user/www
- !!Server
<<: *http-server
name: HTTPS (SSL) Server
port: 443
...


The correct output from running this class is:
[Server ('HTTP Server' serves '/home/user/www' on port 80), Server
('HTTPS (SSL) Server' serves '/home/user/www' on port 443)]

The relevant "gotcha" section of the documentation can be removed now.


/Jordan

P.S. It seems like the documentation for how to work with SnakeYaml
should be updated to make it easier for people to contribute. I've
mostly been floundering my way around in the dark up until now because
I've never had occasion to use the same set of tools -- all previous
jobs used Ant or other build tools over Maven and SVN/CVS/ClearCase
over Mercurial.

DemonWasp

unread,
Jan 18, 2011, 2:15:03 PM1/18/11
to SnakeYAML
After a lot of Google-ing, I'm still confused: how to I update the
code in my (google code) clone from the (google code) master?

This is distinct from updating (local) clone from (googlecode) clone.

Thanks,
/Jordan

On Jan 18, 7:16 am, Andrey <py4...@gmail.com> wrote:

maslovalex

unread,
Jan 18, 2011, 2:23:13 PM1/18/11
to SnakeYAML
http://hgbook.red-bean.com/read/


hg in https://snakeyaml.googlecode.com/hg/ - will show is there any
incoming changesets

hg pull https://snakeyaml.googlecode.com/hg/ - will get changes from
master

in will create additional head in your local repo. You need to merge
new changes to your working copy and commit.

check the book (see 1st link)

-alex


Then

Andrey Somov

unread,
Jan 18, 2011, 7:21:27 PM1/18/11
to snakeya...@googlegroups.com
The only things you can do with a remote repository are pull and push
The chain is:
- 'hg pull -u' from the master
- do whatever you wish locally (change, merge, etc)
- 'hg push' to your remote clone


Can you may be write what were the questions and we will try to update the wiki to help others.
Should it be something about:
- create a remote repository (your own clone)
- download the repository ('hg clone')
- run 'mvn test' and 'mvn eclipse:eclipse'
- make changes in Eclipse and run the tests
- push your changes back to your clone
You may send the patch  'hg diff > my-patch.txt', then the remote repository is not required.

-
Andrey

Andrey

unread,
Jan 19, 2011, 4:16:44 AM1/19/11
to snakeya...@googlegroups.com
I have just updated http://code.google.com/p/snakeyaml/wiki/Developing

Is it better now ?

-
Andrey

DemonWasp

unread,
Jan 20, 2011, 1:15:49 AM1/20/11
to SnakeYAML
That's a huge improvement. The only other thing I would suggest is a
friendly "want to help?"-type link on the main page, directing people
to that page (plus other, more technical, documentation).

Thanks,
/Jordan

On Jan 19, 4:16 am, Andrey <py4...@gmail.com> wrote:
> I have just updatedhttp://code.google.com/p/snakeyaml/wiki/Developing
Reply all
Reply to author
Forward
0 new messages