detailed source mappings for scalars

31 views
Skip to first unread message

Ben Liblit

unread,
May 30, 2023, 11:35:58 AM5/30/23
to SnakeYAML
I am using SnakeYAML to parse documents whose string scalars have additional internal structure specific to my application. When processing those strings, I may need to show errors to the user. I'd like to describe error locations in detail, with start line/column and possibly end line/column. However, those source coordinates need to make sense in the original YAML document. For example, suppose "x" is not allowed in items[0] here:

items:
  - aaabbcxddd

Relative to the string value itself, the disallowed "x" appears on line 0, column 6. (I'll use 0-based coordinates throughout this message.) However, as embedded in the complete YAML document, the disallowed "x" appears on line 1, column 10. When reporting the problem to the YAML document's author, I want to use these coordinates: problem on line 1, column 10.

Naïvely, this feels easy enough: (1) determine the starting coordinates of the string scalar, then add those to any coordinates that are relative to the scalar itself. However, YAML's many syntaxes for scalar strings make this much harder in general. I'd need to deal with perturbations of coordinates due to comments, multiple forms of quoting, multiple forms of folding, etc.

In the general case, it seems that I need a mapping of source coordinates from parsed scalar values back to their original YAML locations. Such a mapping would need to faithfully represent all of the transformations that can happen as we go from the original YAML to the information that YAML represents. For example:

multiline:
  once
  upon
  a
  time

The parsed value of this string scalar is "once upon a time" (without quotes). A precise source mapping would let me know that the location of letter "t" on line 0, column 12 within "once upon a time" maps back to line 4, column 2 in the source YAML.

Can SnakeYAML provide source mappings with this level of detail? Is this something that SnakeYAML already provides? If not, is it feasible for me to build such a mapping using existing SnakeYAML APIs? Or is this requirement fundamentally outside the bounds of what SnakeYAML can do?

Thanks for any hints,
Ben

Andrey Somov

unread,
May 30, 2023, 1:56:33 PM5/30/23
to snakeya...@googlegroups.com
Hi Ben,
I am not sure I understand it (to be exact - I am sure I don't understand it)

1. Is this something that SnakeYAML already provides? - no, it does not
2. If not, is it feasible for me to build such a mapping using existing SnakeYAML APIs? - this info is removed during Node creation. The Node must be extended to keep it
3. nothing is  "fundamentally outside the bounds of what SnakeYAML can do". Give it a try

Cheers,
Andrey

--
You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snakeyaml-cor...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/snakeyaml-core/4dfa1333-b0cb-438e-8849-9ba5d0530245n%40googlegroups.com.

Ben Liblit

unread,
May 30, 2023, 5:58:52 PM5/30/23
to SnakeYAML
Thanks for the speedy reply, Andrey. Perhaps I can ask my question in a different way.

Consider the following YAML source, and assume that I need to bring the user's attention to the highlighted "t" character:

value: once upon a time

Using some combination of Constructors and TypeDescriptions, presumably I can get my hands on the ScalarNode that represents the scalar string value of the value property. Using the startMark in that ScalarNode, I can determine the starting line and column of this scalar string. I know that the "t" character is at offset 12 in the deserialized "once upon a time" string. So I add 12 to the startMark column and then I know the line and column of that "t" character.

Adding an offset doesn't work in general, though. Consider:

value:
    once
    upon
    a
    time

The deserialized string is still "once upon a time", exactly as before, and the "t" character of interest is still at offset 12 (or if you prefer, line 0 column 12) within this deserialized string. But I cannot simply add 12 to the column of the startMark of the ScalarNode that represents this string. That won't take me to the correct source location in the original YAML document. In fact, that wouldn't even be a valid column position on that line of the original YAML document at all. The correspondence between the deserialized string and the original YAML text is more complex now: characters are no longer in a direct 1:1 correspondence between the two. I'm not sure whether it's possible to get detailed information that describes that correspondence.

Similar problems would arise with many standard string deserialization features, such as quoting or interpretation of backslash escapes. For example:

value: "\t\t\t\t\t\t\t\t\t\t$"

I want to refer to the source line and column of the "$" character above. It's at offset 22 relative to the double quote that marks the start of the string in the YAML source text. But each "\t" deserializes as a single tab character, so the "$" is only at offset 10 in the deserialized string. Suppose I'm looking at the deserialized string, and I know that I want to refer to the original source location of the 10th character. How do I know that this was the 22nd character from the start of the string as it appeared in the YAML source text?

Andrey

unread,
May 31, 2023, 1:48:33 AM5/31/23
to SnakeYAML
The high level API will not help at all. You can study the low level API (check how the Composer works)
Now the lines in the folded scalar (https://yaml.org/spec/1.1/#id929764) are lost because SnakeYAML does not use it and it is not required by the spec.
You will need to extend/improve Node

By the way, feel free to check SnakeYAML Engine (https://bitbucket.org/snakeyaml/snakeyaml-engine/src/master/). Its low level API is better, it supports YAML 1.2

Cheers,
Andrey

Ben Liblit

unread,
May 31, 2023, 10:30:33 AM5/31/23
to snakeya...@googlegroups.com
On Wed, May 31, 2023 at 1:48 AM Andrey <py4...@gmail.com> wrote:
The high level API will not help at all. You can study the low level API (check how the Composer works)

OK, thank you for that recommendation to help me get started.
 
By the way, feel free to check SnakeYAML Engine (https://bitbucket.org/snakeyaml/snakeyaml-engine/src/master/). Its low level API is better, it supports YAML 1.2

Will do! Do you know of any documentation or tutorials that focus on the differences between SnakeYAML and SnakeYAML Engine? That would help me estimate the work involved in porting my existing SnakeYAML code over.

Thanks,
Ben 

Andrey

unread,
Jun 1, 2023, 3:06:30 AM6/1/23
to SnakeYAML
Well, SnakeYAML Engine has a new high level API. The implementation of both is almost the same. The low level API is very similar.
Engine supports YAMLK 1.2 and it does not create JavaBeans.

For your use case they are almost the same. 

Andrey

Reply all
Reply to author
Forward
0 new messages