Deserialize YAML with anchors and references

1,548 views
Skip to first unread message

John Passaro

unread,
Feb 12, 2019, 4:18:39 PM2/12/19
to jackson-user
Hello folks! I am trying to deserialize a yaml file with anchors and references. There are some existing StackOverflow questions along these lines but the answers aren't quite getting me to the finish line.

Here is my model:

@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
class YamlScratch {
   
@JsonProperty("misc")
   
List<List<String>> misc;

   
@JsonProperty("contents")
   
Map<String, Config> contents;

   
static class Config {
       
@JsonProperty("header")
       
String header;

       
@JsonProperty("labels")
       
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
       
List<String> labels;
   
}
}



Here is my YAML:

misc:
 
- &letters
   
- Aie
   
- Bee
   
- See
 
- &numbers
   
- One
   
- Two
   
- Three

contents
:
  letters
:
    header
: "This is a list of phonetic letters"
    labels
: *letters
  numbers
:
    header
: "This is a list of number spellings"
    labels
: *numbers
  moreletters
:
    header
: "this is another copy of the letters"
    labels
: *letters


The idea is that I have lists that may be referenced more than once in the "contents" tree, and I'd like to be able to reference them concisely.

I've tried to enable this in my Java code by adding @JsonIdentityInfo to the "labels" field that will reference these lists. As for the YAML, I added "&..." anchors to the data that will be referenced, and where it should appear I added "*..." references. PyYAML seems to confirm this is the correct YAML usage:

$ python3
>>> import yaml
>>> f = open("/Users/johnpassaro/Library/Preferences/IdeaIC2018.3/scratches/scratch.yml")
>>> y = yaml.load(f)
>>> y
{'misc': [['Aie', 'Bee', 'See'], ['One', 'Two', 'Three']], 'contents': {'letters': {'header': 'This is a list of phonetic letters', 'labels': ['Aie', 'Bee', 'See']}, 'numbers': {'header': 'This is a list of number spellings', 'labels': ['One', 'Two', 'Three']}}}


With Jackson (2.9), I get an error:

Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of java.util.ArrayList out of VALUE_STRING token
 at [Source: (File); line: 14, column: 13] (through reference chain: YamlScratch["contents"]->java.util.LinkedHashMap["letters"]->YamlScratch$Config["labels"])

I tried changing
List<String> labels;
to
Object labels;
just to see how Jackson was treating it: instead of resolving the anchor reference ("*letters"), it just returns the reference name as as a string ("letters").

One of the StackOverflow questions has an answer that mentions the feature YAMLParser.Feature.USE_NATIVE_OBJECT_ID. That would seem to be exactly what I need, but that enum value is not present in 2.9.

Is this behavior supported at all? If so, what do I need to do to make it work?

Many thanks in advance for your help. I'd be happy to post the results on the relevant SO threads to make sure this information get shared reasonably widely.

John Passaro

unread,
Feb 12, 2019, 4:39:37 PM2/12/19
to jackson-user
I realize for completeness I should also include the code I'm running to deserialize, though I'm sure this will surprise nobody:

new ObjectMapper(new YAMLFactory()).readValue(new File("/path/to/myfile.yaml"), YamlScratch.class);

Tatu Saloranta

unread,
Feb 12, 2019, 11:02:59 PM2/12/19
to jackson-user
Ok. I think I can point to the problem itself at least.

For Jackson to handle anchors and references, property value types (or
property declarations) need `@JsonIdentityInfo`; otherwise Jackson
does not know to look for, or keep track of, anchors (ids for values
to reference) or references.

But one limitation is that only POJO types support Object Id handling,
and here references would be to Lists of Strings (or maybe Lists of
Lists). In theory it would be possible to handle Object Ids for
Collection, Map and array types, but they are not supported at this
point.

It might, however, be possible for you to create POJO type that gets
serialized as yaml/json Array, just like List (and deserialized from
as well). This could work by using combination of `@JsonValue` (to get
`List` to serialize) and `@JsonCreator` annotated constructor that
takes actual `List` value you want; or, using Converters. Either way
once you get that working, and type itself annotated with
`@JsonIdentityInfo` it should work.

I know this is sub-optimal and leaves out some valid YAML cases. But
it just might work for your usage.

I hope this helps,

-+ Tatu +-

John Passaro

unread,
Feb 12, 2019, 11:43:38 PM2/12/19
to jackso...@googlegroups.com
I will certainly try this out, it does seem like it would address my need. Thank you for the close attention to the example and for your suggestion. If it works I'll post my findings (i.e. working example) here and maybe add to those SO questions.

It seems to me there might be a friendlier experience if there was a YamlParser.Feature to treat the entire document as having the JsonIdentityInfo annotation, including Lists and Maps and their contents - that is, resolving anchor references everywhere without the need for indicating where the user expects them to be. I don't know enough to assert that this is closer to the intention of the anchor/reference feature or of an average document that uses it, but it is closer to how PyYAML treats them, which I've found very useful. I hope you'll consider supporting such usage.

Regardless, thank you for your help and for your work on this powerful library.

John Passaro

unread,
Feb 13, 2019, 12:10:25 PM2/13/19
to jackson-user
On Tuesday, February 12, 2019 at 11:43:38 PM UTC-5, John Passaro wrote:
On Tue, Feb 12, 2019, 23:02 Tatu Saloranta wrote:
On Tue, Feb 12, 2019 at 1:18 PM John Passaro wrote:
>
> Hello folks! I am trying to deserialize a yaml file with anchors and references. There are some existing StackOverflow questions along these lines but the answers aren't quite getting me to the finish line.
>
> [...]

>
>
> The idea is that I have lists that may be referenced more than once in the "contents" tree, and I'd like to be able to reference them concisely.
>
> [...]

>
> With Jackson (2.9), I get an error:
>
> Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of java.util.ArrayList out of VALUE_STRING token
>  at [Source: (File); line: 14, column: 13] (through reference chain: YamlScratch["contents"]->java.util.LinkedHashMap["letters"]->YamlScratch$Config["labels"])
>
> [...]
Sadly this did not work. Here is my best shot based on how I understood the suggestion:

@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
class ScratchModel {
   
@JsonProperty("contents")
   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
Map<String, Config> contents;

   
@JsonProperty("misc")
   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
Misc misc;

   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)

   
static class Config {
       
@JsonProperty("header")
       
String header;

       
@JsonProperty("labels")
       
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
       
List<String> labels;

   
}


   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
private static class Misc {
       
@JsonCreator
       
Misc(List<MiscInner> inners) { }
   
}


   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
private static class MiscInner {
       
@JsonCreator
       
MiscInner(List<String> strings) { }
   
}
}

So I broke down the "misc" field, previously List<List<String>>, into a type Misc that creates itself from List<MiscInner>, and MiscInner which creates itself from List<String>. I added the @JsonIdentityInfo annotation to every class and property that could plausibly need it.

Still getting the same error!

Here is the failing YAML again, for reference:

Tatu Saloranta

unread,
Feb 14, 2019, 1:42:35 PM2/14/19
to jackson-user
Yes, it is true that for YAML documents the idea of using "default
identity" (similar to "default typing" for polymorphic type handling)
would be useful.
Unfortunately I am not sure how practical it would be for general
case, one problem being the limited support for non-POJO types.

In practice one could achieve something like this by custom
`AnnotationIntrospector` that basically "finds" equivalent of
`@JsonIdentityInfo` on every introspected type.

But as to more general handling... I wonder if `JsonNode` could be
forced (with a feature or YAMLMapper setting) to support anchors/refs
and in THAT case you could first resolve all of these and THEN map to
actual POJOs.

I will file an issue for that -- I can't promise I can make that
happen, but I think that is a reasonable idea and could help tackle
this completely without requiring annotations.

-+ Tatu +-

John Passaro

unread,
Feb 14, 2019, 5:30:31 PM2/14/19
to jackso...@googlegroups.com
Thank you. I've been digging around jackson code while trying to resolve the issue and would be interested in contributing on this if you're open to it.

I'm still stuck with purportedly valid YAML that I cannot parse. Maybe you've seen my follow-up reply where I tried to implement the workaround you mentioned? If you can offer any further suggestions for how I can make this work, with a mountain of annotations if necessary, I would be very grateful.

John Passaro
(917) 678-8293


--
You received this message because you are subscribed to a topic in the Google Groups "jackson-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jackson-user/b3ucK2emRbE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jackson-user...@googlegroups.com.
To post to this group, send email to jackso...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tatu Saloranta

unread,
Feb 14, 2019, 9:46:51 PM2/14/19
to jackson-user
So, looks legit; you should not need `@JsonIdentityInfo` on properties
with value types that already have them.

As to the problem, I think what would help is if this could be
distilled into minimal reproduction, to maybe see what is triggering
the failure.

-+ Tatu +-

>
> Still getting the same error!
>
> Here is the failing YAML again, for reference:
>
> misc:
> - &letters
> - Aie
> - Bee
> - See
> - &numbers
> - One
> - Two
> - Three
>
> contents:
> letters:
> header: "This is a list of phonetic letters"
> labels: *letters
> numbers:
> header: "This is a list of number spellings"
> labels: *numbers
> moreletters:
> header: "this is another copy of the letters"
> labels: *letters
>
> --
> You received this message because you are subscribed to the Google Groups "jackson-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jackson-user...@googlegroups.com.

John Passaro

unread,
Feb 15, 2019, 12:32:08 PM2/15/19
to jackson-user
Here is my best attempt at a minimal example:

class YamlScratch {

   
private static final String YAML_CONTENT = "" +
       
"foo: &foo\n" +
       
"  value: bar\n" +
       
"boo: *foo\n";

   
public static void main(String[] args) throws Exception {
       
ScratchModel yamlScratch = new ObjectMapper(
           
new YAMLFactory()
       
).readValue(YAML_CONTENT, ScratchModel.class);
       
System.out.printf("foo = %s, boo=%s%n", yamlScratch.foo, yamlScratch.boo);
   
}

   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
public static class ScratchModel {
       
@JsonProperty("foo")
       
StringHolder foo;

       
@JsonProperty("boo")
       
StringHolder boo;
   
}
   
@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)
   
public static class StringHolder {
       
@JsonProperty("value")
       
String value;

       
@Override
       
public String toString() {
           
return value;
       
}
   
}
}

Output:

Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance of `YamlScratch$StringHolder` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('foo')
 at
[Source: (StringReader); line: 3, column: 6] (through reference chain: YamlScratch$ScratchModel["boo"])
    at com
.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:63)
    at com
.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1329)
    at com
.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1031)
    at com
.fasterxml.jackson.databind.deser.ValueInstantiator._createFromStringFallbacks(ValueInstantiator.java:370)
    at com
.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromString(StdValueInstantiator.java:314)
    at com
.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromString(BeanDeserializerBase.java:1351)
    at com
.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:170)
    at com
.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161)
    at com
.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:136)
    at com
.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:287)
    at com
.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
    at com
.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4001)
    at com
.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2992)
    at
YamlScratch.main(scratch_8.java:17)

Attached screenshot shows Jackson libraries and versions being referenced - short version is that everything is 2.9.1, but jackson-annotations for some reason is 2.9.0. This doesn't seem to stop any other Jackson functionality from working, so I suspect it is not the problem, but I'm highlighting it for you just in case.
Screen Shot 2019-02-15 at 12.28.35 PM.png

Tatu Saloranta

unread,
Feb 19, 2019, 1:27:31 AM2/19/19
to jackson-user
Ok, yes, I can reproduce this, and no, it should fail. Issue now:

https://github.com/FasterXML/jackson-dataformats-text/issues/123

-+ Tatu +-

Tatu Saloranta

unread,
Feb 21, 2019, 2:00:18 AM2/21/19
to jackson-user
Ok, so, one thing I missed; this is wrong:

@JsonIdentityInfo(generator = ObjectIdGenerators.None.class)

as `None` here means "disable ObjectId handling". So even if you never
generate Object Ids, there has to be handler. `StringIdGenerator` fits
the bill I think.

There was another problem, but I was able to make issue #123 work (fix
will be in 2.9.9 / 2.10.0) for this simple case. So it is possible
this particular usage might work with new version.

-+ Tatu +-

John Passaro

unread,
Feb 21, 2019, 4:58:26 AM2/21/19
to jackso...@googlegroups.com
Great!

That name "generator" is a bit misleading, "handler" as you just put it seems more apt but I imagine you're stuck with it? I assumed "generator" meant "for serialization" and I was not (usually am not) looking to serialize to YAML, but I don't remember reading the javadoc super closely so it's possible I missed this detail that a generator is needed even for deserialization.

I would love to test some more complicated examples sooner rather than later. Do you expect to release 2.9.9 in this next two days? If not, I will check out the appropriate branch in jackson-dataformats-text and add a test case as you did for #123; if there is a preferable method, please let me know.

I'm very grateful for your work on Jackson and especially on this issue. Thank you so much!

You received this message because you are subscribed to a topic in the Google Groups "jackson-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jackson-user/b3ucK2emRbE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jackson-user...@googlegroups.com.

Tatu Saloranta

unread,
Feb 21, 2019, 8:04:04 PM2/21/19
to jackson-user
On Thu, Feb 21, 2019 at 1:58 AM John Passaro <john.a....@gmail.com> wrote:
>
> Great!
>
> That name "generator" is a bit misleading, "handler" as you just put it seems more apt but I imagine you're stuck with it? I assumed "generator" meant "for serialization" and I was not (usually am not) looking to serialize to YAML, but I don't remember reading the javadoc super closely so it's possible I missed this detail that a generator is needed even for deserialization.

Yes, naming is bit misleading due to dual-usage here -- main use is
for id handling, but secondary one is as marker for "no, disable it".
One thing that sometimes is needed is ability to suppress various
settings (usually via mix-ins), and so this value is marker.
In hindsight probably would have been better to have something like
`enabled` (or `disabled`) with boolean value.

> I would love to test some more complicated examples sooner rather than later. Do you expect to release 2.9.9 in this next two days? If not, I will check out the appropriate branch in jackson-dataformats-text and add a test case as you did for #123; if there is a preferable method, please let me know.

Unfortunately full release process is bit involved, and with 2.9.9
being quite late in patch release I don't think I will release 2.9.9
for couple of weeks.
So using `2.9` branch for builds makes sense.

>
> I'm very grateful for your work on Jackson and especially on this issue. Thank you so much!

Thank you for your help! Testing of Object Ids for YAML isn't very
good and I can't imagine it works as well as it could without
developers pushing boundaries.

I hope we can further improve handling for 3.0, and work here helps
finding places where backwards-incompatible (regarding internal
interfaces, not so much public API) are needed.

Concept of Native Object Ids was added for YAML, but it was retrofit
same way as whole Object Id handling was (over original system that
had no concept or support), and warts are the result.

-+ Tatu +-
Reply all
Reply to author
Forward
0 new messages