SnakeYaml-2.0 Proposal: Separate configuration from input/output

26 views
Skip to first unread message

DemonWasp

unread,
Mar 2, 2011, 10:38:09 PM3/2/11
to SnakeYAML
Summary:
Separate objects containing "configuration" from objects that handle
input or output (I/O). The new configuration objects should be
stateless, while I/O objects may be state-full. In general, users
should be able to use, modify and extend configuration objects, but
should be discouraged from extending input / output objects.


Reasoning / Benefits:
- configuration objects can be saved or loaded (to and from YAML,
naturally)
- reduce complexity involved in modifying SnakeYaml behaviour
- if done carefully, configurations may be shared among many threads
- reduce complexity in flow of control for construct / represent steps


Scope:
This change will require substantial changes to Constructor and
Representer. Some minor changes to the top-level API (Yaml,
JavaBeanLoader, JavaBeanDumper) will be required. Some changes to
tests to update those that rely on the older means of configuration
(extending Constructor / SafeConstructor) to use the new way
(YamlConfig).


Modifications Summary:
1. Introduce classes: YamlConfig (handles all configuration),
YamlLoaderConfig (configuration used only for load step),
YamlDumperConfig, ConstructorRegistry (configuration object to hold
all Construct instances and handle choosing the correct one),
RepresenterRegistry. For examples of the finished classes, see my
clone:
https://code.google.com/r/jordanangold-enhanced-usability/source/browse/#hg%2Fsrc%2Fmain%2Fjava%2Forg%2Fyaml%2Fsnakeyaml%2Fconfig

2. Fold the existing DumperOptions class into YamlDumperConfig or vice-
versa. Whichever way is chosen, we should choose a straightforward
naming convention.

3. Remove internal Construct classes from BaseConstructor,
SafeConstructor, Constructor (and for Representers) into
ConstructorRegistry.

See my clone:
https://code.google.com/r/jordanangold-enhanced-usability/source/browse/#hg%2Fsrc%2Fmain%2Fjava%2Forg%2Fyaml%2Fsnakeyaml%2Fconstructor
and https://code.google.com/r/jordanangold-enhanced-usability/source/browse/#hg%2Fsrc%2Fmain%2Fjava%2Forg%2Fyaml%2Fsnakeyaml%2Frepresenter

4. Merge BaseConstructor, SafeConstructor, Constructor into a single
class. Do the same for Representers.


I welcome thoughts, opinions, comments and questions on this proposal.

Thanks,
/Jordan

Andrey

unread,
Mar 11, 2011, 4:33:22 AM3/11/11
to snakeya...@googlegroups.com
Hi Jordan,
it took some time to think about your proposals.

In general I think the proposals are far to big to be discussed and implemented in one go. We have to split the task.

If users have to migrate to another API they must see the clear advantages. The most desirable advantage would be implementing YAML 1.2 specification. So if we introduce a breaking change we shall implement the new spec.
We may start version 2.0 but we shall begin with the new specification. Once it is implemented we can think what else can be improved.

I like the idea of supporting configuration. I think we can introduce a package (as you have done) and implement there whatever we need. We do not need to break the existing public API for this.

I did not quite catch proposals under 3) and 4). What are the advantages ?

I think you can create a clone with the configuration only and we can continue step-by-step.

-
Andrey

DemonWasp

unread,
Mar 12, 2011, 1:02:12 AM3/12/11
to SnakeYAML
Hi Andrey,

While this proposal could be subdivided into a few smaller changes, it
is one cohesive proposal, which is why I submitted it this way. For
example, it is possible to refactor just Representer or just
Constructor, but without refactoring both, the benefits are lost.

You are correct that the public-API change could be removed to a later
revision, assuming that the existing Yaml, JavaBeanLoader and
JavaBeanDumper objects were updated to use config objects.

I disagree that the YAML 1.2 specification is necessarily a needed or
wanted feature. As I understand it, this specification is still
incomplete. While it is possible to implement it while it is still a
draft, I don't see that the 1.2 specification brings any major changes
to improve the language or make users' lives easier.

I will expand on proposals 3 and 4 below:

Proposal 3:
Currently, the Constructor object contains both the current "state" of
construction, as well as a "configuration" of how Construction should
proceed. The state is held in many of the fields in BaseConstructor,
SafeConstructor and Constructor, and is specific to a single document.
Additionally, these objects contain collections of Construct instances
that describe a way of transforming the composed tree into Java
objects. Much of SnakeYaml's existing configurability comes from
extending Constructor to modify the contents of those collections (see
ClassConstructor, among many other examples in tests).

The proposal here is to separate these two halves. The configuration
(Construct instances and the getConstructor implementation) can be
separated from the objects representing the state of I/O. A new
object, "ConstructorRegistry" or "ConstructorManager" or whatever,
provides the getConstructor implementation.

The proposal also applies to BaseRepresenter, SafeRepresenter,
Representer, and Represent instances. The parallel is obvious.

The advantage is that once these objects are separated, it becomes
much easier to (de)serialize the configuration objects. Additionally,
once a configuration is created, it may be re-used over and over, in
many threads. This may reduce the amount of code required to configure
SnakeYaml (I have seen that this happens in the tests, though the real
world may vary).


Proposal 4:
By removing the distinction between BaseConstructor, SafeConstructor
and Constructor, the flow of control when objects are constructed
becomes simpler to deal with. The choice of Base / Safe / Standard
Constructor is instead handled by configuration objects. After
removing the Construct instances particular to each configuration, the
three classes very similar and offer no real distinction.


I will create a clone to make these changes as neatly as possible
soon. I will probably not have time until after the weekend, at the
earliest.


Thank you for taking the time to read and respond,
/Jordan

Andrey Somov

unread,
Mar 15, 2011, 8:23:34 AM3/15/11
to snakeya...@googlegroups.com
Hi Jordan,

> but without refactoring both, the benefits are lost.

What are the benefits ? Less code for users ? How much less ?
The benefits must be big enough for users to justify the change.


> I disagree that the YAML 1.2 specification is necessarily a needed or
> wanted feature. As I understand it, this specification is still
> incomplete. While it is possible to implement it while it is still a
> draft, I don't see that the 1.2 specification brings any major changes
> to improve the language or make users' lives easier.

YAML 1.2 specification is ready to be implemented. It may contain some
questions but should not be a problem. The current specification is
also not always consistent.
YAML 1.2 brings a very important feature - JSON compatibility. JSON is
a subset of YAML 1.2 (but not YAML 1.1)

In general, we are open for any change. Implement one proposal in a
clone and we can discuss the details (please do not forget to respect
the code conventions).

-
Andrey

DemonWasp

unread,
Mar 16, 2011, 1:39:34 PM3/16/11
to SnakeYAML
Replies in-line below.


On Mar 15, 8:23 am, Andrey Somov <py4...@gmail.com> wrote:
> (snip)
>
> What are the benefits ? Less code for users ? How much less ?
> The benefits must be big enough for users to justify the change.

Benefits:
1. Configurations can be serialized (saved, loaded, transferred, hand-
edited).

This is useful mostly out of convenience. However, one particularly
good use-case is the following:

- developer creates a YamlConfig as used by his application
- developer releases a serialized version of that YamlConfig
- a YAML editor can now load that file and use that configuration to
check the Yaml file, outside of that program


2. Custom Represent / Construct implementations become simpler to use.

Instead of extending Representer or Constructor, users can simply add
their Represent / Construct implementations into a config object,
which they can than pass around. Users are no longer restricted to the
"chain" of configurations implied by extending Representer, which
extends SafeRepresenter, which extends Representer.

This also has the happy side-effect of removing the ridiculous
"Representer is a SafeRepresenter" and "Constructor is a
SafeRepresenter" situation that currently exists in SnakeYaml. I know
this confused me when I first started examining SnakeYaml's source.


3. Configurations can be used by many threads.

Currently, each Representer / Constructor has its own set of
configuration objects. By separating these, they can be reused, both
from one load to the next, but also by many threads, simultaneously.
In particular, a "static final YamlConfig" may be created to handle
all loading.




> (snip)
>
> In general, we are open for any change. Implement one proposal in a
> clone and we can discuss the details (please do not forget to respect
> the code conventions).

I've implemented part of the proposal (the Representer side) in the
jordanangold-separate-config clone (http://code.google.com/r/
jordanangold-separate-config/source/browse).

Please review this and comment on it. Be aware that as this is
incomplete, most of the advantages are not present.


Thanks,
/Jordan

Andrey

unread,
Mar 17, 2011, 9:31:03 AM3/17/11
to snakeya...@googlegroups.com
Cool. Let us continue like that.

A few general comments:
1) I am sorry I have reformatted the code with the Eclipse Helios formatter. Can you please merge the changes from the master ?
I would strongly recommended you to install Eclipse Helios and import our formatter  from src/etc/Eclipse-format.xml
2) please run 'mvn clean site' before you commit. It serves 2 purposes. First you control the test coverage. For instance we can see that coverage for RepresenterRegistry is 60% while in average we have 97%.
Second, it will automatically add the license clause to the beginning of the new files. (you can, of cause, always run the license plugin manually)

I see better now what you mean by "separating config from state". I think we shall go further in this direction. 
1) I did not quite catch why we need YamlDumperConfig and YamlLoaderConfig. They have no state and almost no business logic. I think they may go with all the code going to YamlConfig. (Then RepresenterRegistry becomes an instance variable of YamlConfig)
2) Static methods in YamlConfig are in fact factory methods and it is better to rename them to createXXX()
Otherwise it looks like YamlConfig has some static state
3) Static state (static mutable variables) and static initialization must be avoided.(in the new JVM languages like Scala they even dropped the word 'static'). 
4) the change looks incomplete: Representer has more configuration then it is defined in YamlConfig. Everything defined in DumperOptions (Representer.defaultStyle , Representer.defaultFlowStyle ), Representer.classTags, Representer.propertyUtils is still in the Representer source. 
5) 'null' is also a scalar. I think it shall be in the same package as other scalars.

-
Andrey

DemonWasp

unread,
Mar 18, 2011, 8:00:29 AM3/18/11
to SnakeYAML


On Mar 17, 9:31 am, Andrey <py4...@gmail.com> wrote:
> 1) I am sorry I have reformatted the code with the Eclipse Helios formatter.
> Can you please merge the changes from the master ?
> I would strongly recommended you to install Eclipse Helios and import
> our formatter  from src/etc/Eclipse-format.xml

I thought I had been doing that -- I was using whatever the Eclipse
Helios formatter had picked up on its own, which to me looked very
similar to the existing code style. I will try pointing it to that
file manually.

Would I have to merge code formatting changes from master, or would it
be sufficient to apply the code formatter manually?

> 2) please run 'mvn clean site' before you commit. It serves 2 purposes.
> First you control the test coverage. For instance we can see that coverage
> for RepresenterRegistry is 60% while in average we have 97%.
> Second, it will automatically add the license clause to the beginning of the
> new files. (you can, of cause, always run the license plugin manually)

I had actually run mvn cobertura:cobertura, which I guess doesn't do
as much. I will try to do that in the future.

Additional test coverage is something I will work on, but it wasn't
necessary to get the point across.

> 1) I did not quite catch why we need YamlDumperConfig and YamlLoaderConfig.
> They have no state and almost no business logic. I think they may go with
> all the code going to YamlConfig. (Then RepresenterRegistry becomes an
> instance variable of YamlConfig)

We don't "need" them. However, without them, there may be a lot of
information and state in YamlConfig. You can look at my earlier clone
for an example, or you can consider that everything in
org.yaml.snakeyaml.DumperOptions now would need to become part of
YamlDumperConfig (or, vice-versa, depending on naming scheme).

By separating the configuration into three groups (loading only,
dumping only, both operations) makes it a little clearer in code what
the use of each of these variables is. There is no great harm in
combining them, but then there's no great need either.

> 2) Static methods in YamlConfig are in fact factory methods and it is better
> to rename them to createXXX()

Agreed. I will make this change.

> 3) Static state (static mutable variables) and static initialization must be
> avoided.(in the new JVM languages like Scala they even dropped the word
> 'static').

I'm not sure where you're seeing this in my code. I agree, but I
thought I'd been avoiding static mutable state anyway.

I am not a Scala expert, but I thought that the "object" keyword in
Scala defines a singleton that can have mutable state. Am I wrong?

> 4) the change looks incomplete: Representer has more configuration then it
> is defined in YamlConfig. Everything defined in DumperOptions
> (Representer.defaultStyle , Representer.defaultFlowStyle
> ), Representer.classTags, Representer.propertyUtils is still in
> the Representer source.

You're right. I have not yet merged the configuration in DumperOptions
into YamlDumperConfig. Doing so would have required that I expand the
scope of the change, which I was trying to keep as small as possible.
This is definitely in the plan, but it wasn't necessary to demonstrate
the idea.

> 5) 'null' is also a scalar. I think it shall be in the same package as other
> scalars.

Certainly. The reason it was separate initially was because it's
treated separately in how Representers are chosen.


Thanks for considering this,
/Jordan
Reply all
Reply to author
Forward
0 new messages