How to detect duplicate tags in an input YAML files?

5,512 views
Skip to first unread message

Mark Laff

unread,
Feb 19, 2012, 7:44:53 AM2/19/12
to SnakeYAML
Hi,

Should the following generate an error, or quietly ignore the first
key/value?

foo:
value1
foo:
value2

I would like to be able to warn my user about the duplicate key and
resulting information loss.


Thanks,


Mark

Andrey

unread,
Feb 19, 2012, 7:37:44 PM2/19/12
to snakeya...@googlegroups.com
Hi Mark,
the specification is not strict about duplicate mapping keys. But the authors say that it should throw an exception:

Unfortunately, many parsers simply ignore  duplicate keys. SnakeYAML follows the same approach to be compatible.

What do you mean by "to warn my user about the duplicate key" ? Do you expect an exception ?

We need to think about a flexible solution. Do you have a proposal ?

-
Andrey


Joanne Bogart

unread,
Jan 30, 2015, 5:30:07 PM1/30/15
to snakeya...@googlegroups.com
Nearly 3 years later, I have the same issue.   It is very easy to make a simple editing mistake in a YAML file resulting in a duplicate key, but I see no way to get SnakeYAML to inform the caller when the file is loaded.  Am I missing something?  If not, could there be an option to configure a new Yaml object or a parameter to load which would tell it to throw an exception in case it encounters a duplicate key?

Joanne

Andrey Somov

unread,
Feb 2, 2015, 9:17:42 AM2/2/15
to snakeya...@googlegroups.com
Dear Joanne and Mark,
why don't you take the most quick and reliable solution ?
Why not to try to send us a patch with your implementation ? This is Open Source, isn't it ?
All the contributors wanted to solve their particular problem and this is how the project has evolved.
How can we solve your problem if we do not see it as a problem ?

Contributions are welcome.

Cheers,
Andrey





--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-cor...@googlegroups.com.
To post to this group, send email to snakeya...@googlegroups.com.
Visit this group at http://groups.google.com/group/snakeyaml-core.
For more options, visit https://groups.google.com/d/optout.

Joachim Durchholz

unread,
Feb 2, 2015, 3:12:33 PM2/2/15
to snakeya...@googlegroups.com
Am 02.02.2015 um 15:17 schrieb Andrey Somov:
> Why not to try to send us a patch with your implementation ? This is Open
> Source, isn't it ?

It's difficult to send a patch if it's unclear what kind of patch would
be acceptable. Throw an exception? Collect the additional values and
somehow return them? Do some other way of error reporting?

So the question is legitimate I think.

Also, it's not true that fixing the problem and sending a patch is
always the easiest route. Some people may lack the expertise, some may
lack the ability to make patches, some may have to sign off too much
corporate paperwork to do that, etc. pp.

From what I see in the YAML spec, SnakeYAML should report a duplicate
key just as it would report any semantic errors in its input (e.g. a
reference to a nonexistent anchor).
(I'd fix it if I were currently using SnakeYAML, but the project that I
had planned it for is currently on hiatus. Too many other things to do.)

Joanne Bogart

unread,
Feb 2, 2015, 8:40:12 PM2/2/15
to snakeya...@googlegroups.com
The spec is clear that this is an error:

"Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. This is non-trivial since YAML allows various ways to format a given scalar content. For example, the integer eleven can be written as “013” (octal) or “0xB” (hexadecimal). If both forms are used as keys in the same mapping, only a YAML processor which recognizes integer formats would correctly flag the duplicate key as an error. "

It seems to me SnakeYAML should throw an exception, as is done for other errors.  To keep backwards compatibility, this could be a configuration option.

 I haven't yet looked at the code. If it already properly detects all forms of duplicate keys as described above it should be an easy patch.  Otherwise, doing it properly could be quite a bit of work.  For my application, which only uses string scalars as keys, an adequate fix would likely be much simpler, but this very partial fix is probably not appropriate to incorporate.

Andrey Somov

unread,
Feb 3, 2015, 3:09:17 AM2/3/15
to snakeya...@googlegroups.com
Well, let us see what the sentence really says:
" If both forms are used as keys in the same mapping, only a YAML processor which recognizes integer formats would correctly flag the duplicate key as an error. "

1) "If both forms are used as keys in the same mapping," - this is clearly not the use case
2) "only a YAML processor which recognizes integer formats"-  this says that the parsers which do not recognise integer format (yamlbeans, for instance) may never distinguish the error
3) please note "would correctly flag", there is no "must correctly flag"

The decision what to do with duplicate key is left to the parser. SnakeYAML now has the same policy as other major parsers (for Python and for Ruby). Changing the policy would break the inter-language compatibility. (the same document which succeeds to parse in one language fails to parse in another)
It does not mean it can not be implemented. Feel free to send us a patch.
The real problem is that no one wants to make any contribution.

Cheers,
Andrey


--

Joachim Durchholz

unread,
Feb 3, 2015, 4:39:57 AM2/3/15
to snakeya...@googlegroups.com
Let me collect all relevant places in the specs.
Full disclosure: This is the first time I'm going through them looking
for something formal.


> http://www.yaml.org/spec/1.2/spec.html

> 1.3 Relation to JSON JSON's RFC4627 requires that mappings keys
> merely “SHOULD” be unique, while YAML insists they “MUST” be.

The section is descriptive rather than prescriptive I think, but the
intent of the spec makers is pretty clearly stated here.

> 3.1.2 Load Constructing Native Data Structures Construction must be
> based only on the information available in the representation, and
> not on additional serialization or presentation details such as [...]
> mapping key order, [...]

If duplicate keys are ignored, the selection of which of two conflicting
key-value pairs will get included implies a dependency on mapping key
order. (You could code something to make the selectin random, but
throwing an error might be easier.)

This is a very implicit conclusion, I doubt that the authors wanted to
enforce key uniqueness with just this paragraph :-)

> 3.2.1. Representation Graph Mapping nodes are somewhat tricky
> because their keys are unordered and must be unique.

Looks like another reference to a uniqueness requirement somewhere in
the spec.

> 3.2.1.1. Nodes A YAML node represents a single native data
> structure.

I.e. the node is the consolidated YAML information for a native data
structure. Regardless of origin (multiple key-value pairs, merging [1],
whatever).

[1] Merging is actually an exception, see the end of this analysis.

> 3.2.1.1. Nodes Mapping The content of a mapping node is an unordered
> set of key: value node pairs, with the restriction that each of the
> keys is unique.

So all the consolidated YAML information pertaining to a Java Map must
not contain multiple keys.

> 3.2.1.3. Node Comparison
> Since YAML mappings require key uniqueness, representations must
> include a mechanism for testing the equality of nodes. This is
> non-trivial since YAML allows various ways to format scalar content.
> For example, the integer eleven can be written as “0o13” (octal) or
> “0xB” (hexadecimal). If both notations are used as keys in the same
> mapping, only a YAML processor which recognizes integer formats would
> correctly flag the duplicate key as an error.

The last partial sentence can be read as "YAML processors do not need to
recognize duplicates if that's due to representational differences", but
it can also be read as "YAML processors need to canonicalize integer
keys so they can do proper duplicated detection".
Given that duplicate keys are excluded prescriptively, and in multiple
location descriptively, I prefer the latter interpretation.

> http://yaml.org/type/merge.html

Now here the rules are quite different:

> The “<<” merge key is used to indicate that all the keys of one or
> more specified maps should be inserted into the current map. If the
> value associated with the key is a single mapping node, each of its
> key/value pairs is inserted into the current mapping, unless the key
> already exists in it.

So a *merge* does indeed have a dependency on YAML order: earlier
key:value pairs override later ones.

I hope this helps with decision-making ;-)

Regards
Jo

Andrey Somov

unread,
Feb 3, 2015, 4:46:45 AM2/3/15
to snakeya...@googlegroups.com
Well, SnakeYAML does NOT implement spec 1.2



--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-core+unsubscribe@googlegroups.com.
To post to this group, send email to snakeyaml-core@googlegroups.com.



--
Andrey Somov

Joachim Durchholz

unread,
Feb 3, 2015, 12:26:40 PM2/3/15
to snakeya...@googlegroups.com
Am 03.02.2015 um 10:46 schrieb Andrey Somov:
> Well, SnakeYAML does NOT implement spec 1.2

... it does 1.1...

Sure.
I still think that 1.2 should be used where 1.1 leaves room for
interpretation. 1.2 is intended as a clarification to 1.1, not as a
potentially incompatible update.

But anyway. Same analysis for 1.1.
tl;dr: The spec REQUIRES any conformant processor to ignore the second
key:value pair with a duplicate key, *but issue a warning*.


> http://yaml.org/spec/1.1/current.html

> 3.2.1. Representation Graph
>
> Mapping nodes are somewhat tricky because their keys are unordered
> and must be unique.

A MUST requirement, but on the document, not on the processor (which has
a specification how to deal with invalid input, see below).

> 3.2.1.1. Nodes
>
> Mapping
>
> The content of a mapping node is an unordered set of key: value node
> pairs, with the restriction that each of the keys is unique.

Same applies as with 3.2.1.

> 3.2.1.3. Nodes Comparison
>
> Since YAML mappings require key uniqueness, representations must
> include a mechanism for testing the equality of nodes. This is
> non-trivial since YAML allows various ways to format a given scalar
> content. For example, the integer eleven can be written as “013”
> (octal) or “0xB” (hexadecimal). If both forms are used as keys in the
> same mapping, only a YAML processor which recognizes integer formats
> would correctly flag the duplicate key as an error.

I agree that a processor that does not recognize integer formats cannot
flag duplicate keys as an error.

What we disagree about is whether the above paragraph gives the
processor the freedom to ignore duplicate keys, because:
1) This does not give the freedom to ignore non-integer duplicate keys.
The entire paragraph is solely about the special case of integer keys.
2) Even for integer duplicates, it would give the freedom only for those
cases where the duplicity is due to representational differences.
3) If the references to uniqueness requirements in the preceding
paragraphs are considered a restriction on the YAML process, 3.2.1.3.
merely implies that a conforming YAML processor needs to implement
integer interpretatin (which is needs to do anyway to construct a native
object).

> 10.2. Mapping Styles
>
> It is an error for two equal keys to appear in the same mapping
> node. In such a case the YAML processor may continue, ignoring the
> second key: value pair and issuing an appropriate warning.

Now that's a clear prescription how to deal with duplicates!
I browsed the code a bit but didn't find a mechanism do deal with
warnings, just code for errors - maybe that's going to be a problem.
OTOH warnings could be emitted as events, for example, so the caller can
deal with these (or ignore them).

Just my 2c.
Well, 5c. Too much text for 2c ;-D

Andrey Somov

unread,
Feb 3, 2015, 12:40:07 PM2/3/15
to snakeya...@googlegroups.com
>I browsed the code a bit but didn't find a mechanism do deal with warnings, just code for errors - maybe that's going to be a problem. OTOH > warnings could be emitted as events, for example, so the caller can deal with these (or ignore them).

That is why a small patch is better then a big discussion.

How the API should look like ?

Cheers,
Andrey


Joachim Durchholz

unread,
Feb 3, 2015, 2:08:52 PM2/3/15
to snakeya...@googlegroups.com
Am 03.02.2015 um 18:40 schrieb Andrey Somov:
>> I browsed the code a bit but didn't find a mechanism do deal with
>> warnings, just code for errors - maybe that's going to be a problem. OTOH >
>> warnings could be emitted as events, for example, so the caller can deal
>> with these (or ignore them).
>
> That is why a small patch is better then a big discussion.

Difficult, because I have no good answer to your question:

> How the API should look like ?

I don't really have an idea. Or, rather, too many of them, and I have no
idea what's the best one (and the one that fits best with the SnakeYAML
codebase).

Approaches that come to mind:

- Put the message into the event stream. (Could break compatibility.)
- Emit a warning through logging. (Not nice if SnakeYaml doesn't use any
logger yet.)
- Throw an exception. (Could break compatibility.)

So... none of the approaches that I can think of is ideal.

If backwards compatibility cannot be maintained, then people need a way
to stick with the existing behaviour.
An option on the YAML object would probably be the right way. Possibly
with a StrictYAML subclass that does setCheckDuplicates(true) in its
constructor. There are several equally valid approaches for that though.

Patches are nice, but apart from the problem that not everybody can
write one, writing a patch would be pointless if it's unclear what it
should actually do.

Regards,
Jo

Joanne Bogart

unread,
Feb 3, 2015, 4:25:18 PM2/3/15
to snakeya...@googlegroups.com
One more relevant section from the 1.1 spec is paragraph 3.3.1

A well-formed character stream must match the productions specified in the next chapter. Successful loading also requires that each alias shall refer to a previous node identified by the anchor. A YAML processor should reject ill-formed streams and unidentified aliases. A YAML processor may recover from syntax errors, possibly by ignoring certain parts of the input, but it must provide a mechanism for reporting such errors.
In the case of duplicate keys,  SnakeYAML violates that "must": it ignores part of the input, but silently.



On Tuesday, February 3, 2015 at 11:08:52 AM UTC-8, toolforger wrote:
Am 03.02.2015 um 18:40 schrieb Andrey Somov:
>> I browsed the code a bit but didn't find a mechanism do deal with
>> warnings, just code for errors - maybe that's going to be a problem. OTOH >
>> warnings could be emitted as events, for example, so the caller can deal
>> with these (or ignore them).
>
> That is why a small patch is better then a big discussion.

Difficult, because I have no good answer to your question:

> How the API should look like ?

I don't really have an idea. Or, rather, too many of them, and I have no
idea what's the best one (and the one that fits best with the SnakeYAML
codebase).

Approaches that come to mind:

- Put the message into the event stream. (Could break compatibility.)
- Emit a warning through logging. (Not nice if SnakeYaml doesn't use any
logger yet.)
- Throw an exception. (Could break compatibility.)

So... none of the approaches that I can think of is ideal.

If backwards compatibility cannot be maintained, then people need a way
to stick with the existing behaviour.
An option on the YAML object would probably be the right way. Possibly
with a StrictYAML subclass that does setCheckDuplicates(true) in its
constructor. There are several equally valid approaches for that though.

I would vote for this, but there are still decisions to be made about what constitutes a duplicate and whether the patch would handle the most generous interpretation  or something more restricted. Given my lack of familiarity with the code and time constraints I could at best attempt only something quite restricted.

best,
Joanne


Andrey Somov

unread,
Feb 4, 2015, 5:35:42 AM2/4/15
to snakeya...@googlegroups.com
Dear Joanne,
I did not quite catch your message: "One more relevant section from the 1.1 spec is paragraph 3.3.1"
What are you looking for ? Do you want to find inconsistencies in the spec ?
Spec 1.1 clearly explicitly says that this is not an error. 3 parsers (Java, Python Ruby) implement it the same way. Period.
If you wish to discuss the spec please contact the general list: yaml-core (https://lists.sourceforge.net/lists/listinfo/yaml-core)
If you wish to help please make a proposal.
(the message "I do not have time - please do it for me" does not really help...)


Cheers,
Andrey


--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-cor...@googlegroups.com.
To post to this group, send email to snakeya...@googlegroups.com.

Andrey Somov

unread,
Feb 4, 2015, 5:53:32 AM2/4/15
to snakeya...@googlegroups.com
Well, finally our discussion goes to point.
I was confronted with these kind of problems when I was writing the initial implementation.
No solution has been found.

Restrictions:
- no external dependencies (which does not allow to use any logging)
- no unconditional exception -> the problem is not backwards compatibility, the problem is that other users do not want to get any error in case of duplicate keys (as with other parsers)
- simple configuration for parsing, flexible for other usages

When we ask for a patch we mean that those who ask for a feature should see the complete picture and provide tests to analyse the use cases. If somebody manages to send some code with updated and added tests there is no reason to reject it.
Apparently no one ever sent anything for duplicate keys - which is clear indication that it is not really needed.

Cheers,
Andrey



Joachim Durchholz

unread,
Feb 4, 2015, 7:16:35 AM2/4/15
to snakeya...@googlegroups.com
Am 04.02.2015 um 11:53 schrieb Andrey Somov:
> Apparently no one ever sent anything for duplicate keys - which is clear
> indication that it is not really needed.

Sorry, that's so overgeneralized to be 90% false.

Joachim Durchholz

unread,
Feb 4, 2015, 7:39:38 AM2/4/15
to snakeya...@googlegroups.com
Am 04.02.2015 um 11:35 schrieb Andrey Somov:
> Spec 1.1 clearly explicitly says that this is not an error.

It does. The last paragraph I quoted does, the paragraph she quoted does.

> 3 parsers
> (Java, Python Ruby) implement it the same way.

That does not make them right.
Besides, not all parsers do it the same way, and I found bug reports for
the very same issue in other projects, too.

> Period.

Let me translate that: "I don't care."

That's fine, nobody is forcing you into making SnakeYAML more useful for
more use cases.

> If you wish to help please make a proposal.

I have made one, but given your "period" attitude, I just lost all interest.

> (the message "I do not have time - please do it for me" does not really
> help...)

Well, "Period." does not help either.

"Period" was just the highlight actually.
You've been investing more effort into arguing the issue away than into
finding ways to make SnakeYAML more useful.
With that in mind, SnakeYAML probably should not be recommended to
people who do not have the time or skill to redo the libraries they use.

Andrey Somov

unread,
Feb 4, 2015, 7:57:25 AM2/4/15
to snakeya...@googlegroups.com
Dear Joachim,
you misunderstood me.
When I say "period" I want to be exact - I want to stop the attempts to interpret the specification in many ways. Trying to give different interpretations brings nothing but confusion. If someone wants to discuss the spec - no problem, but please use the appropriate list - yaml-core.

All I want is that people instead of arguing the spec give any contribution.

Contributions are welcome, arguments that SnakeYAML is wrong because it does not follow "yet another spec interpretation" are not.

A lot of contributors (https://code.google.com/p/snakeyaml/wiki/changes) came with their problems and together we were looking for the solutions.
I propose to do the same: think about possible solution, make a proposal. What is wrong here ? You can better start another thread or open a new issue.

Cheers,
Andrey





--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-core+unsubscribe@googlegroups.com.
To post to this group, send email to snakeyaml-core@googlegroups.com.

Joachim Durchholz

unread,
Feb 4, 2015, 10:20:59 AM2/4/15
to snakeya...@googlegroups.com
Am 04.02.2015 um 13:57 schrieb Andrey Somov:
> Dear Joachim,
> you misunderstood me.
> When I say "period" I want to be exact - I want to stop the attempts to
> interpret the specification in many ways.

It's about interpreting the spec correctly.
SnakeYAML is interpreting it incorrectly (as most of the other YAML
processors, though it escapes me why projects ignore the spec. I guess
that's one of the reasons why YAML still fails to gain traction.

> Trying to give different
> interpretations brings nothing but confusion. If someone wants to discuss
> the spec - no problem, but please use the appropriate list - yaml-core.

To repeat myself, the spec is clear enough: duplicate keys MUST be
reported, the processor MAY continue by keeping the first key:value pair
it encounters.
That's rephrased, the pertinent section was quoted in my 1.1 analysis,
but it leaves no room to interpretation at all - at least I think so,
you obviously didn't read the analysis so it's unverified.

You feel offended that people make demands and don't do anything.
I can understand that, but I hope you can understand that I feel just
the same to see my analysis work brushed aside and ignored.

Andrey Somov

unread,
Feb 4, 2015, 10:39:35 AM2/4/15
to snakeya...@googlegroups.com
Ok, for clarity let us divide the issue into 2 parts.

Part 1:
Check that  SnakeYAML and other parsers follow the specification 1.1

Part 2:
Make a feature request to improve SnakeYAML

All the discussion about Part 1 should be moved to the general list - yaml-core (where it belongs to)
Once we get a confirmation from the spec owners that this is against the spec 1.1 we will fix it (it is one line to change)

All the discussion about Part 2 should be here - proposals are welcome.

Cheers,
Andrey


Joachim Durchholz

unread,
Feb 4, 2015, 12:43:51 PM2/4/15
to snakeya...@googlegroups.com
Am 04.02.2015 um 16:39 schrieb Andrey Somov:
> Ok, for clarity let us divide the issue into 2 parts.
>
> Part 1:
> Check that SnakeYAML and other parsers follow the specification 1.1
>
> Part 2:
> Make a feature request to improve SnakeYAML
>
> All the discussion about Part 1 should be moved to the general list -
> yaml-core (where it belongs to)

If you want to discuss where just reading the paragraph would be enough,
go ahead.

I'm outta here.

Andrey Somov

unread,
Feb 4, 2015, 12:46:38 PM2/4/15
to snakeya...@googlegroups.com
>If you want to discuss where just reading the paragraph would be enough, go ahead.

I am sorry, I do not understand this message

Joanne Bogart

unread,
Feb 4, 2015, 12:53:45 PM2/4/15
to snakeya...@googlegroups.com


On Wednesday, February 4, 2015 at 7:39:35 AM UTC-8, Andrey wrote:
Ok, for clarity let us divide the issue into 2 parts.

Part 1:
Check that  SnakeYAML and other parsers follow the specification 1.1
I have checked and SnakeYAML doesn't.  (Apparently other parsers don't, either, but I don't have any direct experience with that.).  It's not a matter of interpretation.  I found it violates this sentence from the spec:


"A YAML processor may recover from syntax errors, possibly by ignoring certain parts of the input, but it must provide a mechanism for reporting such errors."

 in case a document contains duplicate keys in a map (and hence is ill-formed).

Since this is an issue with SnakeYAML implementation, I don't see why the discussion belongs in the core-yaml list.  

However there is a related issue which perhaps does belong there, namely why is it that several parsers violate this portion of the spec?  I suspect there are at least two reasons:  1) for many applications (but not mine!) the current behavior is acceptable, even perhaps preferred  2) a correct implementation of the spec is hard.  If it's not practical to implement the spec as written, then one might consider revising the spec.

Part 2:
Make a feature request to improve SnakeYAML

All the discussion about Part 1 should be moved to the general list - yaml-core (where it belongs to)
Once we get a confirmation from the spec owners that this is against the spec 1.1 we will fix it (it is one line to change)

Ok, so it's not hard.  Great!  Which line?  I will cheerfully fix it  (probably making the new behavior an option so as to keep backwards compatibility), test, and, if all goes well, submit a patch.

best,
Joanne  
All the discussion about Part 2 should be here - proposals are welcome.

Cheers,
Andrey

On Wed, Feb 4, 2015 at 4:20 PM, Joachim Durchholz <j...@durchholz.org> wrote:
Am 04.02.2015 um 13:57 schrieb Andrey Somov:
Dear Joachim,
you misunderstood me.
When I say "period" I want to be exact - I want to stop the attempts to
interpret the specification in many ways.

It's about interpreting the spec correctly.
SnakeYAML is interpreting it incorrectly (as most of the other YAML processors, though it escapes me why projects ignore the spec. I guess that's one of the reasons why YAML still fails to gain traction.

> Trying to give different
interpretations brings nothing but confusion. If someone wants to discuss
the spec - no problem, but please use the appropriate list - yaml-core.

To repeat myself, the spec is clear enough: duplicate keys MUST be reported, the processor MAY continue by keeping the first key:value pair it encounters.
That's rephrased, the pertinent section was quoted in my 1.1 analysis, but it leaves no room to interpretation at all - at least I think so, you obviously didn't read the analysis so it's unverified.

You feel offended that people make demands and don't do anything.
I can understand that, but I hope you can understand that I feel just the same to see my analysis work brushed aside and ignored.


--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-cor...@googlegroups.com.
To post to this group, send email to snakeya...@googlegroups.com.

Andrey Somov

unread,
Feb 4, 2015, 1:13:09 PM2/4/15
to snakeya...@googlegroups.com
Dear Joanne,
may I kindly ask you to move the spec discussions to the other list ?

>in case a document contains duplicate keys in a map (and hence is ill-formed).

This is YOUR interpretation. The spec owners EXPLICITLY said that this is well-formed document.
Please read the yaml-core list for the clarifications.
That is why it is implemented the same way everywhere.

>Ok, so it's not hard.  Great!  Which line?  I will cheerfully fix it  (probably making the new behavior an option so as to keep backwards >compatibility), test, and, if all goes well, submit a patch.

Joanne Bogart

unread,
Feb 4, 2015, 1:34:25 PM2/4/15
to snakeya...@googlegroups.com


On Wednesday, February 4, 2015 at 10:13:09 AM UTC-8, Andrey wrote:
Dear Joanne,
may I kindly ask you to move the spec discussions to the other list ?

I have sent a subscribe request.

>in case a document contains duplicate keys in a map (and hence is ill-formed).

This is YOUR interpretation. The spec owners EXPLICITLY said that this is well-formed document.
Please read the yaml-core list for the clarifications.
That is why it is implemented the same way everywhere.

>Ok, so it's not hard.  Great!  Which line?  I will cheerfully fix it  (probably making the new behavior an option so as to keep backwards >compatibility), test, and, if all goes well, submit a patch.

I have already checked out the code and have spent some time looking at it.   I have also offered to do some work. You apparently have knowledge which would shorten the time my task would take considerably, at very little cost to you.  Why are you not willing to point me in the right direction?

Joanne
 

maslovalex

unread,
Feb 4, 2015, 3:49:39 PM2/4/15
to snakeya...@googlegroups.com
Hi Joanne,

I actually no so sure about one line change, but I may be wrong :)

I have no comments about low-level API (like using events), because I have never used it.

But talking about object construction...

For mappings (Map, Set) construction see BaseConstuctor.java:358 (protected void constructMapping2ndStep...)
It iterates through all mapping key->value pairs and creates key first and value next. Maybe check could be done there.
For JavaBeans actually you need to fix another place - Constructor.java:223 (protected Object constructJavaBean2ndStep(...))

So I am not so sure that this is the way to do it. Maybe we need to come up with something else for more general. Unfortunately I do not have any general at the moment, maybe doing something while filling MappingNode....

Hope it help at least somehow.

BTW, thanks for your interest in SnakeYAML

-Alex

P.S.
 Years back there was a discussion or at least an idea to make some sort of YAML-validation... Never developed further than an idea, if I am correct ;)

Andrey Somov

unread,
Feb 4, 2015, 4:43:30 PM2/4/15
to snakeya...@googlegroups.com
Well, let us make a preliminary change and see how the community reacts.
I have implemented the exception for duplicate keys.
Take the latest snapshot and try it.
https://oss.sonatype.org/content/groups/public/org/yaml/snakeyaml/1.15-SNAPSHOT/

This is the change: https://code.google.com/p/snakeyaml/source/detail?r=58ea501dd9599d5d527664771d9f080cfe878f9bhttps://code.google.com/p/snakeyaml/source/detail?r=58ea501dd9599d5d527664771d9f080cfe878f9b

I will start a separate thread and I will ask the users to give their opinion on this change.
If the community is in favour to fail for duplicate keys - it will stay as the default behaviour.

Version 1.15 should be released later this month.

Cheers,
Andrey

Joanne Bogart

unread,
Feb 4, 2015, 6:38:55 PM2/4/15
to snakeya...@googlegroups.com
Thank you!  I'll report back soon, after I've had a chance to try it.

Since this changes the default behavior I can well imagine some people will be unhappy with it.  If so, I am willing to implement a way to choose which behavior a particular Yaml instance has, e.g. by adding a new constructor with an argument setting an internal flag.

best,
Joanne

Andrey Somov

unread,
Feb 5, 2015, 5:09:54 AM2/5/15
to snakeya...@googlegroups.com
Joanne,
you can now see how the change was done.
Feel free to make a proposal how to make this feature optional.

Andrey

Joanne Bogart

unread,
Feb 5, 2015, 7:40:21 PM2/5/15
to snakeya...@googlegroups.com
Andrey,
Your version works just fine as is for my purposes.  

In order to make the new behavior optional, I added another constructor
  public Yaml(boolean flagDuplicates)

and made changes to classes Constructor, BaseConstructor and SafeConstructor as well in order to propagate the flag everywhere it needs to go.  Calls to the original Yaml  constructor will behave as before.  

It does the right thing (old behavior for Yaml( ) and Yaml(false); new behavior for Yaml(true) ) for the small number of test cases I've tried.  It ought to be tested more thoroughly, but if it is ok I'd like to hear opinions on
   * whether the API I've chosen is ok or whether it should more naturally generalize in case there are other potentially optional features
   * whether default should be old behavior or new (currently it's old)

Joanne

Joanne Bogart

unread,
Feb 5, 2015, 8:07:22 PM2/5/15
to snakeya...@googlegroups.com
Hi Alex,
Thanks for the pointers.  I didn't take a good look earlier because I saw Andrey's message announcing his change at the same time.   His change is in BaseConstructor as you describe.  If there ought to be a similar change for Java beans I think I see how to do it.

Joanne

--
You received this message because you are subscribed to a topic in the Google Groups "SnakeYAML" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snakeyaml-core/M-TRKg-0-F4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snakeyaml-cor...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages