TypeDefinition for a nested collection, and how to ignore most of a document

580 views
Skip to first unread message

dkh

unread,
Oct 2, 2012, 3:58:50 PM10/2/12
to snakeya...@googlegroups.com
Hi guys!  Thanks for all the work you've put into SnakeYAML.  It definitely makes my life easier.  I have two questions about how to handle a config file I'm trying to parse.  

Question 1:  Is there an example in the source of how to create a TypeDefinition for a Map[String, List[MyBeanType]]?  Something to parse a map like this (ignore the syntax errors; this is just an example) into a Map[String, List[TireBean]]:

"front tires":  [ { brand: Michelin,    mileage: 10000 }, { brand: Bridgestone, mileage: 5000  } ]
"rear tires":   [ { brand: Bridgestone, mileage: 5000  }, { brand: Michelin,    mileage: 10000 } ]

Question 2:  The system I'm working on is configured via a single YAML document.  At the top level, the document is a map, and each component in the system understands one key/value pair in the map.  The rest of the document is opaque.  What's the right way to model and parse just one value in the map, ignoring the structure of the rest of the file?  For the sake of a simple example, suppose this is the config file and I'm coding the component that is configured by the "developers" entry:

"this part might":
- "change"
- "any time"
developers:  # I understand this part!  It isn't tagged, unfortunately.
- { name: "Bjarne Stroustrup", language: { name: "C++", release: "1983-ish" } }
- { name: "Martin Odersky", language: { name: "Scala", release: "2003" } }
unpredictable:
  could: [ be, "anything here, really"]
  who: knows
not:
- my
- business

Given a document containing just the "developers" entry, I can write a constructor that parses it into objects of appropriate types.  If composer and constructDocument were protected in BaseConstructor instead of private, I would override getSingleData in my Constructor and pass just the Node containing the list of developers to constructDocument.  Or I could find the node I want and tag that node instead of tagging the root node.  Since I keep coming up with ideas that depend on accessing private members :-) (principally the constructor's composer) I thought I'd better ask what the right way to do it is.

Thanks,
David

Andrey

unread,
Oct 3, 2012, 3:08:16 AM10/3/12
to snakeya...@googlegroups.com


On Tuesday, October 2, 2012 9:58:50 PM UTC+2, dkh wrote:
Hi guys!  Thanks for all the work you've put into SnakeYAML.  It definitely makes my life easier.  I have two questions about how to handle a config file I'm trying to parse.  

Question 1:  Is there an example in the source of how to create a TypeDefinition for a Map[String, List[MyBeanType]]?  Something to parse a map like this (ignore the syntax errors; this is just an example) into a Map[String, List[TireBean]]:

"front tires":  [ { brand: Michelin,    mileage: 10000 }, { brand: Bridgestone, mileage: 5000  } ]
"rear tires":   [ { brand: Bridgestone, mileage: 5000  }, { brand: Michelin,    mileage: 10000 } ]

I do not think, this use case is covered. If you cannot transform the type to MyHolderBeen[String, List[MyBeanType]] (because of the spaces in the names) then you have to find your own solution.

Question 2:  The system I'm working on is configured via a single YAML document.  At the top level, the document is a map, and each component in the system understands one key/value pair in the map.  The rest of the document is opaque.  What's the right way to model and parse just one value in the map, ignoring the structure of the rest of the file?  For the sake of a simple example, suppose this is the config file and I'm coding the component that is configured by the "developers" entry:

"this part might":
- "change"
- "any time"
developers:  # I understand this part!  It isn't tagged, unfortunately.
- { name: "Bjarne Stroustrup", language: { name: "C++", release: "1983-ish" } }
- { name: "Martin Odersky", language: { name: "Scala", release: "2003" } }
unpredictable:
  could: [ be, "anything here, really"]
  who: knows
not:
- my
- business

Given a document containing just the "developers" entry, I can write a constructor that parses it into objects of appropriate types.  If composer and constructDocument were protected in BaseConstructor instead of private, I would override getSingleData in my Constructor and pass just the Node containing the list of developers to constructDocument.  Or I could find the node I want and tag that node instead of tagging the root node.  Since I keep coming up with ideas that depend on accessing private members :-) (principally the constructor's composer) I thought I'd better ask what the right way to do it is.

I must admit that I do not clearly understand what you wish to achieve.  The private modifier is used not to hide anything but to help users to avoid mistakes. If you come with a proposal (use case, test suite, implementation) then we can change the modifier, or we can split methods to allow more flexibility for overriding.


Andrey

maslovalex

unread,
Oct 3, 2012, 3:16:00 AM10/3/12
to snakeya...@googlegroups.com
Hi David.

As Andrey replied already it may be difficult.
But could you write a test that covers your example, without trying to fix it. Let it fail.
Share it with us.

I might have "simple" solution for you. It is not in main repo, but we were planing to merge at some point.
And it would be really nice to try things with some "real" cases.

thx in advance.

-alex

maslovalex

unread,
Oct 3, 2012, 3:42:39 AM10/3/12
to snakeya...@googlegroups.com
BTW, interesting question. To you, Andrey (since you know how low-level API works)

Is it possible to do filtering in parsing stage? Maybe using some low-level API.
Idea would be like this.

Document on input is


"this part might":
- "change"
- "any time"
developers:  # I understand this part!  It isn't tagged, unfortunately.
- { name: "Bjarne Stroustrup", language: { name: "C++", release: "1983-ish" } }
- { name: "Martin Odersky", language: { name: "Scala", release: "2003" } }
unpredictable:
  could: [ be, "anything here, really"]
  who: knows
not:
- my
- business
 

But after parsing NodeTree would contain only nodes for


developers:  # I understand this part!  It isn't tagged, unfortunately.
  - { name: "Bjarne Stroustrup", language: { name: "C++", release: "1983-ish" } }
  - { name: "Martin Odersky", language: { name: "Scala", release: "2003" } }

So everything else simply dropped.

Is it possible? I mean is it possible without really big intrusion into the SnakeYAML code?

-alex

Andrey Somov

unread,
Oct 3, 2012, 4:40:27 AM10/3/12
to snakeya...@googlegroups.com
On Wed, Oct 3, 2012 at 9:42 AM, maslovalex <alexande...@gmail.com> wrote:
BTW, interesting question. To you, Andrey (since you know how low-level API works)

Is it possible to do filtering in parsing stage? Maybe using some low-level API.
Idea would be like this.


Everything is possible :)
The problem is that when you have chain of events, there is no hierarchy. You have to keep all the relationships between the nodes on your own. For simple structures it should not be that difficult.

Of course, the whole YAML document must be well-formed.

When you use the low-level API, you create your own graph. If you expect to create the standard graph, which can processed  by further components, then it is a task on its own.
If you can just simply change the source code (it is Open Source !) in SnakeYAML to ignore the events, you are not interested in, then it is one task. If we want a generic way to adapt the parser to any user requirement, then it is something totally different.

Andrey

David Huebel

unread,
Oct 4, 2012, 6:00:23 PM10/4/12
to snakeya...@googlegroups.com
On Wed, Oct 3, 2012 at 2:08 AM, Andrey <py4...@gmail.com> wrote:
>
>
> On Tuesday, October 2, 2012 9:58:50 PM UTC+2, dkh wrote:
>>
>> Question 2: The system I'm working on is configured via a single YAML
>> document. At the top level, the document is a map, and each component in
>> the system understands one key/value pair in the map. The rest of the
>> document is opaque. What's the right way to model and parse just one value
>> in the map, ignoring the structure of the rest of the file?
>> [...]
>
>
> I must admit that I do not clearly understand what you wish to achieve. The
> private modifier is used not to hide anything but to help users to avoid
> mistakes. If you come with a proposal (use case, test suite, implementation)
> then we can change the modifier, or we can split methods to allow more
> flexibility for overriding.

I simplified the example a bit and wrote a short program showing what
I have in mind. I used v1.11 and changed BaseConstructor.composer and
BaseConstructor.constructDocument to be protected instead of private.
For me, this is a reasonable solution, but I'm wondering if I
overlooked a way to accomplish the same thing without accessing these
private members.

https://gist.github.com/3836479

The key idea is that after parsing the document into Nodes, I want to
apply the constructor to a subnode instead of to the root node.
Alternatively, perhaps I could apply a tag to that node before
constructing the document? That hadn't occurred to me.

As a pie-in-the-sky future solution, it would be nice to use some sort
of YAML query language to tag nodes or to (per Alex's suggestion)
filter documents. I'm looking at DPath,
http://search.cpan.org/~schwigon/Data-DPath-0.47/lib/Data/DPath.pm#THE_DPATH_LANGUAGE,
as a way to extract single values needed by deployment scripts, etc.

For example,

composer.tagMatchingNodes("/developers/*", new Tag(typeDescription.getType());
-or-
composer.filter("/developers/*");

-David

>
>
> Andrey
>

Andrey Somov

unread,
Oct 5, 2012, 12:37:33 PM10/5/12
to snakeya...@googlegroups.com
Thank you. Now I can see what you wish to implement and what kind of support you expect from SankeYAML.

1)
please be aware that what you do is inconsistent. It may work for one valid YAML document, but fail for another valid document.
You assume that a part of a YAML document is a valid YAML document. Unfortunately this is not the case because of anchors/ aliases and directives.

2)
The method 'constructDocument(Node node)' is private on purpose. Calling this method changes the state of the constructor. If users call this method then the parsing of the document might fail (when aliases are present)

3)
The 'composer' instance variable is private also on purpose. The composer is  stateful and its state must be synchronised with the state of its parent constructor.

Conclusion:
1) Feel free to do with the source anything you want. As long as you know what you are doing and you can control the YAML document, it should work

2) I do not see your use case as a reason to change SnakeYAML. But I do not mind to change SnakeYAML if the community thinks, that it gives more flexibility for the end users.
I hope other developers can review this request and give their opinion.

Cheers,
Andrey



David Huebel

unread,
Oct 5, 2012, 3:21:44 PM10/5/12
to snakeya...@googlegroups.com
On Fri, Oct 5, 2012 at 11:37 AM, Andrey Somov <py4...@gmail.com> wrote:
> 1)
> please be aware that what you do is inconsistent. It may work for one valid
> YAML document, but fail for another valid document.
> You assume that a part of a YAML document is a valid YAML document.
> Unfortunately this is not the case because of anchors/ aliases and
> directives.

Anchors, aliases, and directives are not present in the representation
graph created by the composer, so they should not pose any problem as
long as I don't modify the Composer. I've tested cases where the Node
I pass to the constructor contains aliases to anchors outside that
subtree, and SnakeYAML appears to handle it correctly. In the test
case I posted, if you change the developer line to read "developer: {
name: *name, language: *language }" with &name and &language anchors
defined previously in the document, the test still succeeds.

> 2) I do not see your use case as a reason to change SnakeYAML. But I do not
> mind to change SnakeYAML if the community thinks, that it gives more
> flexibility for the end users.
> I hope other developers can review this request and give their opinion.

Our strategy of using a single YAML file to configure all the
subsystems of our product may be unusual. Our previous strategy of
using a different config file for each subsystem was simpler to
program, but users and ops hated wrangling multiple files. Simplicity
for the people who manage and modify configs is paramount; otherwise I
could require them to tag entries with type information or separate
the file into different documents using '---'. So it's possible my
needs are unique. In any case, my needs are well served by current
SnakeYAML with very minor modifications, possibly no modifications at
all since I believe I can subvert access control via reflection.

Thank you for your responses!
- David

>
> Cheers,
>
> Andrey
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "SnakeYAML" group.
> To post to this group, send email to snakeya...@googlegroups.com.
> To unsubscribe from this group, send email to
> snakeyaml-cor...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/snakeyaml-core?hl=en.

Andrey Somov

unread,
Oct 5, 2012, 6:31:13 PM10/5/12
to snakeya...@googlegroups.com
I am sorry, I overlooked that you first create the whole Node:
(I assumed you want _to avoid_ creating the complete Node)

Node node = composer.getSingleNode();

The method 'Object getMapValue(String key) throws Exception' is clearly the copy
of the BaseConstructor's method 'public Object getSingleData(Class<?> type)'.

I think you can achieve the same result if you extend Composer. I have just added an example here:

http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyaml/partialconstruct/FragmentComposer.java

As you can see, it is shorter  (one method is completely eliminated) and it does not require any change in SnakeYAML.

This is the test:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyaml/partialconstruct/FragmentComposerTest.java

Take the latest source and give it a try.

Cheers,
Andrey


Reply all
Reply to author
Forward
0 new messages