Override scalar style for particular properties

667 views
Skip to first unread message

Rob Fletcher

unread,
Oct 3, 2011, 7:32:31 PM10/3/11
to SnakeYAML
Hi

I've been using SnakeYAML in my library Betamax: http://robfletcher.github.com/betamax.
I'm using YAML to represent HTTP request/response interactions. I have
been able to tune the representation mostly to how I want it but one
thing I'd like to do is to force the response body text to be
represented with the 'literal' scalar style. I can, of course,
override the default style on `DumperOptions` but that means all
scalars use the literal style and I really only want to use it for
that particular field. I've been playing around with my `Representer`
class but can't figure out how to force that one field to use a
different style. Is this something that's possible right now?

In case it helps my representer implementation is here:
https://github.com/robfletcher/betamax/blob/master/src/main/groovy/betamax/tape/yaml/TapeRepresenter.groovy#L29
and an example of the output I'm currently getting is here:
https://github.com/robfletcher/betamax/blob/master/src/test/resources/betamax/tapes/smoke_spec.yaml

Regards,
Rob Fletcher

Andrey

unread,
Oct 4, 2011, 4:42:34 AM10/4/11
to snakeya...@googlegroups.com
Hi Rob,
Short and quick answer:
1) I think you face the same problem as described in issue 66

You can also see it in a test (try to uncomment the TODO at the very end):

2) The current logic to guess the proper style is already rather big. You can find it in Emitter - processScalar(), chooseScalarStyle(),  analyzeScalar()
The implementation is not perfect and it can be improved (but how ?!)

3) To provide some flexibility we have introduced DumperOptions.calculateScalarStyle() (see issue 29: http://code.google.com/p/snakeyaml/issues/detail?id=29)

You should be able to influence the style (with your implementation of DumperOptions.calculateScalarStyle()), but you must dive into all the details about scalars.

--------------
Long and precise answer:
We need to some time to analyse the issue to propose a good solution :)
Feel free to contribute.

-
Andrey

Jordan Angold

unread,
Oct 4, 2011, 8:33:58 AM10/4/11
to SnakeYAML
Andrey describes several general solutions. Since you only want this
for one field, here's a solution -- with a caveat:

1. Create a new Represent object for the object that contains the
field.

2. In the implementation of that object, when you represent that
field, use representScalar ( Tag tag, String value, Character style ),
with the style argument taken from the DumperOptions.ScalarStyle enum.
In particular, you will want '|'.

See here: http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyaml/representer/BaseRepresenter.java
and here: http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyaml/DumperOptions.java

The only catch here is that you will have to write represent code for
every other element of that object. If that's a lot of work, the
alternative would be to wrap your String in another object -- MyString
-- that is represented by the Represent I describe above.

Good luck,
/Jordan


On Oct 3, 7:32 pm, Rob Fletcher <robert.w.fletc...@gmail.com> wrote:
> Hi
>
> I've been using SnakeYAML in my library Betamax:http://robfletcher.github.com/betamax.
> I'm using YAML to represent HTTP request/response interactions. I have
> been able to tune the representation mostly to how I want it but one
> thing I'd like to do is to force the response body text to be
> represented with the 'literal' scalar style. I can, of course,
> override the default style on `DumperOptions` but that means all
> scalars use the literal style and I really only want to use it for
> that particular field. I've been playing around with my `Representer`
> class but can't figure out how to force that one field to use a
> different style. Is this something that's possible right now?
>
> In case it helps my representer implementation is here:https://github.com/robfletcher/betamax/blob/master/src/main/groovy/be...
> and an example of the output I'm currently getting is here:https://github.com/robfletcher/betamax/blob/master/src/test/resources...
>
> Regards,
> Rob Fletcher

Robert Fletcher

unread,
Oct 4, 2011, 8:40:16 AM10/4/11
to snakeya...@googlegroups.com
Thanks for the responses guys. I tried the approach you describe
Jordan but I found that sometimes the style I passed to
representScalar would get overridden. I guess this is because of the
logic Andrey mentions around working out which style is appropriate
for a particular chunk of text. I'm going to experiment this evening
with Andrey's suggestions

> --
> You received this message because you are subscribed to the Google Groups "SnakeYAML" group.
> To post to this group, send email to snakeya...@googlegroups.com.
> To unsubscribe from this group, send email to snakeyaml-cor...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/snakeyaml-core?hl=en.
>

Andrey

unread,
Oct 4, 2011, 9:43:41 AM10/4/11
to snakeya...@googlegroups.com
The flexibility to define scalar styles when parsing a YAML document appears to be a nightmare for the dumber.
You can easily mix different styles. For instance you can combine flow and block:
- recorded: 2011-10-03T22:17:20.136Z
  request:
    method: GET
    headers: [Accept: '*/*', Accept-Encoding: 'gzip,deflate', Host: grails.org] #this is flow
    body: |- # this is block, literal with strip chomping
      bla bla
      bla blaaaaaaa

I have located the problem. Look in the Emitter for the comment:
        // Although the plain scalar writer supports breaks, we never emit
        // multiline plain scalars.

Changing this would require a major refactoring in the Emitter. We will see what we can do. At this point it is very important to get many use cases.
It would be great if someone can provide a consistent set of requirements, which we can use to program emitter to choose the proper scalar styles. The existing Emitter.analyzeScalar() can be the starting point.

-
Andrey

Andrey

unread,
Oct 4, 2011, 10:41:21 AM10/4/11
to snakeya...@googlegroups.com
I am not sure it helps, but you can see an example here:

testWriteMultiLineLiteralWithStripChomping() should be similar to what you try to achieve.
-
Andrey

Jordan Angold

unread,
Oct 4, 2011, 11:01:27 AM10/4/11
to SnakeYAML
Your guess is right; it looks like Emitter.analyzeScalar looked at the
response text and decided that block scalars (such as the style you
want) cannot represent the text verbatim, and therefore cannot be
used. In particular, the literal style cannot be used if:

1. The text ends in one or more spaces.
2. The text contains spaces followed by a line break (literal scalars
will not produce the original string in there are trailing spaces).
3. The text contains "special characters". In this context, special
characters means "non-printable characters" -- approximately.

Without the text, I don't know which of these applies, but I would
guess #1 or #2 as being more likely. The third one only seems likely
if you have turned off Unicode support (in DumperOptions) and
unescaped the HTML contents. Characters like &mdash; fall outside
ASCII, but are inside Unicode; their escaped representations fall
inside ASCII and would be supported.

/Jordan

On Oct 4, 8:40 am, Robert Fletcher <robert.w.fletc...@gmail.com>
wrote:
> Thanks for the responses guys. I tried the approach you describe
> Jordan but I found that sometimes the style I passed to
> representScalar would get overridden. I guess this is because of the
> logic Andrey mentions around working out which style is appropriate
> for a particular chunk of text. I'm going to experiment this evening
> with Andrey's suggestions
>
> On 4 Oct 2011, at 01:33 PM, Jordan Angold <jordanang...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Andrey describes several general solutions. Since you only want this
> > for one field, here's a solution -- with a caveat:
>
> > 1. Create a new Represent object for the object that contains the
> > field.
>
> > 2. In the implementation of that object, when you represent that
> > field, use representScalar ( Tag tag, String value, Character style ),
> > with the style argument taken from the DumperOptions.ScalarStyle enum.
> > In particular, you will want '|'.
>
> > See here:http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/ya...
> > and here:http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/ya...

Andrey

unread,
Oct 4, 2011, 11:11:44 AM10/4/11
to snakeya...@googlegroups.com
I think in general Emitter is not as flexible as it might be.
Rob, feel free to create a ticket with your expectations. We will combine different expectations and it may end up with some real enhancement in the nest version.

-
Andrey

Andrey

unread,
Oct 10, 2011, 12:02:56 PM10/10/11
to snakeya...@googlegroups.com
Rob,
Take the latest source or try the latest SNAPSHOT.

It should now work as you expect.

Please give it try and let us know.

-
Andey

Robert Fletcher

unread,
Oct 11, 2011, 4:35:52 AM10/11/11
to snakeya...@googlegroups.com
Awesome, thanks. I'll give it a try & get back to you.

Robert Fletcher

unread,
Oct 11, 2011, 5:13:55 AM10/11/11
to snakeya...@googlegroups.com
I've updated my DumperOptions as follows:

def dumperOptions = new DumperOptions() {
@Override
DumperOptions.ScalarStyle calculateScalarStyle(ScalarAnalysis
analysis, DumperOptions.ScalarStyle style) {
analysis.multiline ? DumperOptions.ScalarStyle.LITERAL : style
}
}

This is working for 2 / 3 of my samples but one (JSON data) is still
using quoted style.

I notice that the calculateScalarStyle method is deprecated now. Am I
using the right approach or is there a better way to handle it?

Thanks,
Rob

Andrey Somov

unread,
Oct 11, 2011, 5:25:50 AM10/11/11
to snakeya...@googlegroups.com
The idea is that you should not need to configure anything at all. It
should just work.
Drop your DumperOptions and try again.
It would be perfect if you can contribute your test cases. Then we can
see what you exactly expect.
(Because "this is working for 2 / 3 of my samples" does not really say
anything to me :)

-
Andrey

Robert Fletcher

unread,
Oct 11, 2011, 8:28:57 AM10/11/11
to snakeya...@googlegroups.com
I'm not clear on what "it should just work" means. What scalar style
should I expect if I don't specify any DumperOptions? I'd be happy to
contribute some test cases but I need to know what to expect by
default and how to customize output if what I want is not the default,
assuming what I'm trying to do is even possible which I'm not sure it
is. I don't expect the behaviour I'd like to be the default as I'm
trying to handle different fields in different ways which feels like a
non-generic case.

In the example I'm using (the YAML output I'm getting is here:
https://github.com/robfletcher/betamax/blob/master/src/test/resources/betamax/tapes/smoke_spec.yaml).
Without specifying DumperOptions at all then I get either double or
single quoted style for large scalar values. The plain style used for
the short values is perfect. If at all possible I'd like to be able to
somehow force the use of the literal style for the 'body' property as
this would be easier for people to edit manually. The code used to
generate it is reasonably complex. An HTTP request goes via a proxy
which serializes the response as YAML. The HTTP response data is
passed verbatim to SnakeYAML.

This is an end-to-end smoke test not a unit level test. I haven't
tested specific output at the unit level just whether the YAML
serialization / deserialization is working (which it is, perfectly).

Andrey

unread,
Oct 11, 2011, 8:39:33 AM10/11/11
to snakeya...@googlegroups.com
It means that the literal scalar style for multiline should be chosen automatically without any additional configuration.

But as I already said tabs (especially leading and trailing) make the life significantly more complex.

I have looked into your test data. You have plenty of tabs:
Java platform.\">\t\n\t\n    <title>Grails
</a>\t\t\t\n\t</div>\n\n\t<div id=\"springSourceLogo
</a>\t\t\t\t\t\t\t\t\n                                \n                            </h3>\n

With tabs the YAML document can easily go out of the control. Avoid tabs. Can you may be transform all the tabs to spaces on the fly ?

When you remove all the tabs, the YAML document shall be dumped with the block literal style.

-
Andrey

Robert Fletcher

unread,
Oct 11, 2011, 8:59:16 AM10/11/11
to snakeya...@googlegroups.com
That data is an HTTP response from an external website. Whether it
contains tabs or not isn't under my control. I do plan to add an
option to tidy data before serializing to YAML but I'd strictly want
that to be an option.

I understand that the style is chosen automatically based on the
content but what I was asking is whether, in a case where more than
one style would be a valid option, I can select the one to use on a
property-by-property basis. It seems like this is not possible.

I can override calculateScalarStyle but since I can't tell from there
what property is being emitted I can only use a heuristic such as
whether the data is multiline or longer than x. Also since that method
is deprecated I guess at some point this technique will stop working
for me altogether.

Thanks for all the help.

> --
> You received this message because you are subscribed to the Google Groups
> "SnakeYAML" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/snakeyaml-core/-/-KlWcpa-OmsJ.

Andrey

unread,
Oct 11, 2011, 9:58:29 AM10/11/11
to snakeya...@googlegroups.com
(if you wish to use tabs, then the YAML document will look ugly regardless of the scalar style...)
Please check FlexibleScalarStyleTest. You can see an example how to configure the scalar style based on the contents on the scalar itself. You shall do it in the Representer. The hack in the DumperOptions will be removed because it is used only _after_ all the validity checks. It may lead to an unpredictable output.

If you wish to apply the scalar style based on some other data, you may try to do the following. 
Solution I:
Split the process into 3 steps:
1) use Yaml.represent() to create the Node
2) since you know the Node structure you can go through the node to set the 'style' property in the selected nodes based on values of some other nodes. It may be a challenge because the ScalarNode is immutable. You may also try to create the Node yourself. Since the final document is not complex.
3) use Yaml.serialize(node) to create the result.

You can study Yaml.dumpAll() to see how it works now.

Solution II:
Override Representer.representJavaBean() or Representer.representJavaBeanProperty(). Then based on the value of the 'key', you can influence the scalar style for the 'value'.

-
Andrey



Andrey

unread,
Oct 12, 2011, 6:13:59 AM10/12/11
to snakeya...@googlegroups.com
Since how to dump JavaBeans is a common question, I have created an example:

Any feedback is welcome !

-
Andrey

Andrey

unread,
Oct 17, 2011, 6:38:15 AM10/17/11
to snakeya...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages