What shall I start with if I want to add some other markup language parser into jackson?

11 views
Skip to first unread message

Xeno Amess

unread,
May 16, 2019, 11:28:38 AM5/16/19
to jackson-user
as title.
I'm now learning about https://github.com/FasterXML/jackson-dataformat-xml and hoping I can figure out how to achieve it.
If anybody have some guides about such thing, I'd be appreciated.

Tatu Saloranta

unread,
May 16, 2019, 11:41:47 AM5/16/19
to jackson-user
The main idea behind support for different formats is that most (and
ideally all) work should be done by implementing streaming API --
subtypes of `JsonParser` for reading, `JsonGenerator` for writing;
`JsonFactory` for creating parser/generator instances -- and when that
is done things should work.

XML format is bit of a special case as it also requires overriding of
some of databinding support, and that's why it is probably good idea
to look at one of other implementations as an example instead.

As to streaming API: `JsonParser` takes content from `InputStream` or
`Reader` (and some other less common sources) and tokenizes it,
exposing individual tokens as `JsonToken`s (which just indicate type
and contain no data). Parser has accessors for actual contents of
tokens (like `parser.getText()`).
For types beyond what JSON has, there is also sort of opaque container
(JsonToken.VALUE_EMBEDDED_OBJECT) which can be used to encapsulate
things like binary data.

`JsonGenerator` on the other hand has matching `writeXxx()` methods
that are used to add content that similar match token types (although
`JsonToken` itself is not used: method names match token types).

There are a few special methods for supporting more advanced concepts
like Type Ids (needed for polymorphic type handling -- YAML, Avro and
Ion at least support these natively) and Object Ids (YAML has
anchors/references) which need some care to interact properly with
`jackson-databind`.
There are also some specific capabilities that
parser/generator/factory may indicate, wrt handling, so looking at
JsonParser/JsonGenerator abstract classes (and esp javadocs) may help
in figuring out how to customize different aspects.

Depending on kind of format you are thinking, you may want to have a look at:

* https://github.com/FasterXML/jackson-dataformats-binary: backends
for multiple binary formats (avro, cbor, ion, protobuf, smile)
* https://github.com/FasterXML/jackson-dataformats-text: backends for
textual formats (csv, java properties, yaml)

There are also other format backend implementations not maintained by
jackson team, which may be helpful.
It sounds like you are specifically thinking of support for a textual
(markup) format: this may be more challenging as text formats tend to
vary more than binary ones. Csv, properties and yaml are all
implemented quite differently: YAML uses SnakeYAML for actual
decoding/encoding, CSV is columnar format and requires use of
`FormatSchema` (to map column positions to names), and Properties
requires creation of virtual path from dot-separated names.
So much of this depends on specific format and its requirements.

-+ Tatu +-

ps. On versions: you probably want to have a look at `2.10` branches
of projects -- `master` is for 3.0.0 which will not be released for
quite a while, so all users are on 2.x. API does change quite a bit
between 2.x and 3.x (not so much logically but wrt construction of
parser/generator instances). But conversion from 2.x to 3.0 is
relatively easy.
Reply all
Reply to author
Forward
0 new messages