Just wanted to clarify a few points on the new Okapi configuration file discussion. Based on our current understanding, there will be two types of configs:
1. Schema File (one file to rule them all).
The current options for the schema file is ProtoBuffers (PB) or
JSON Schema. Personally I'm leaning toward ProtoBuffers as we
already use it and adding another layer with JSON schema will mean
learning yet another technology. PB has an import feature which
will allow us to reduce duplication across modules (filters and
steps). PB also works across many languages and is well supported
by Google.
2. User oriented config file.
This is probably the most controversial. This is a file that a
non-programmer will edit. Should have these minimal features.
YAML is one option. Though limiting the syntax would be needed to prevent strange variations that might confuse the user. I think using YAML would make migration easier for the certain filters that already use it. JSON could be used, but in order to preserve comments we would have to write our own parser and writer (not a big ask).
I'm leaning toward YAML as it is cleaner and designed for human's
vs machines. Converting the YAML to JSON, which could then be
loaded and validated by PB, would be simple to do. As a
programmer, I'm not strongly opposed to using JSON directly - but
I think it will be more frustrating for non-programmers as
you have to get every quote and curly brace correct. Visually it
is more cluttered.
Thoughts?
Jim
--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/f68f6e96-86b2-4b28-a4d2-5871b21b8e3a%40gmail.com.
Yeah, that gets closer to JSON-like formats like HOCON. Starting
to lean toward TOML again. It wouldn't be hard to have the
protobuf code serialize/deseralize to TOML.
We produce json (JSON Lines actually) for extraction and I've been having to read that format - it's not easy - even with editor support.
Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/CAK69zbm06ehppUgHAgwDaL7cOyBqABhxn5PCOFzv2HqunQm2zw%40mail.gmail.com.
How about TOML?
Jim
Mihai I do agree with you on this. This is an opportunity for us to remove YAML config from Okapi. Unfortunately, we still need to support YamlFilter - which could be a security risk (more on this in the near future).
However, I still believe that using a clean, readable and supported config format would be a big win for the users. OML/HOCON being the main formats. But I'm open to other solutions, even if we have to come up with our own format.
One clarification: I don't see a reason why we can't config Okapi with direct ProtoBuf output as well as a human readable config. We get the best of both worlds. Simply a matter of adding some wrapper code around ProtoBuf to read the human config format.
Hopefully this will resolve most of the concerns.
Jim
PB does have the advantage of allowing non-Java code to manipulate Okapi Parameters. That is desirable, but maybe could be done outside of Okapi? Check out these projects:
https://github.com/protostuff/protostuff
https://github.com/protostuff/protostuff-compiler
These projects are basically trying to solve the same problem we
are. Not suggesting we should use them in Okapi.
But if we did have proper, consistent Java Beans we could use
these tools during deployment to generate other formats like PB
and even HTML documentation. We could also have an optional sister
project that could provide API's to make all this easier.
I think the way forward is to use pure Java Beans. Basically a
refactor of IParameters to be more consistent and support more
datatypes and add comment and doc fields (as Mihai explained). UI
is something that requires more thought. But once we can convert
our Parameters to various formats maybe one format could be JSON
Forms?
So the basic strategy would be to keep everything simple (pure
JDK only) in Okapi, but use or implement other tools for
conversion, doc creation etc.. Possible formats: ProtoBuf, JSON,
TOML etc..
Jim
Here is an example of how documentation is generated with these projects:
https://protostuff.github.io/samples/protostuff-compiler/html/#/
Not bad, someone with CSS experience could enhance if needed.
Jim