I debated posting here, or straight to GitHub. If that's the better place, I can move the thread there.
I have long wanted proper streaming support in the `encoding/json` library. Lately I’ve been doing some digging to understand the current state of things, and I think I’ve come to grips with most of it.
A number of previous issues relate to this topic: https://github.com/golang/go/issues/7872, https://github.com/golang/go/issues/11046, https://github.com/golang/go/issues/12001, https://github.com/golang/go/issues/14140
I have read through each of these issues, and believe I have a fair understanding of the problems associated with streaming JSON input/output. If I'm overlooking something please enlighten me.
In a nutshell: The library implicitly guarantees that marshaling will never write an incomplete JSON object due to an error, and that during unmarshaling, it will never pass an incomplete JSON message to `UnmarshalJSON`.
Work toward this was done about 3 years ago, on this CL: https://go-review.googlesource.com/c/go/+/13818/
Workt was eventually abandoned, apparently when the author was unsure how to make the new behavior opt-in. I believe this proposal will solve that issue.
The problem to be solved
Dealing with large JSON structures is inefficient, due to the internal buffering done by `encoding/json`. `json.NewEncoder` and `json.NewDecoder` appear to offer streaming benefits, but this is mostly an idiomatic advantage, not a performance one, as internal buffering still takes place. Particular aspects of the broader problem are already addressed on the above mentioned links, and benefits of streaming should be easy to imagine, so I won't bore people with details unless someone asks.
A naïve solution
I believe a simple solution (simple from the perspective of a consumer of the library--the internal changes are not so simple) would be to add two interfaces:
type StreamMarshaler interface {
MarshalJSONStream(io.Writer) error
}
type StreamUnmarshaler interface {
UnmarshalJSONStream(io.Reader) error
}
During (un)marshaling, where `encoding/json` looks for `json.Marshaler` and `json.Unmarshaler` respectively, it will now look for (and possibly prefer) the new interfaces instead. Wrapping either the old or new interfaces to work as the other is a trivial matter.
With this change, and the requisite internal changes, it would be possible to begin streaming large JSON data to a server immediately, from within a `MarshalJSONStream()` implementation, for instance.
The drawback is that it violates the above mentioned promise of complete reads and writes, even with errors.
Opt-in
To accommodate this requirement, I believe it would be possible to expose the streaming functionality _only_ with the `json.Encoder` and `json.Decoder` implementations, and only when `EnablePartial*` (name TBD) is enabled. So further, the following two functions would be added to the public API:
func (*Encoder) EnablePartialWrites(on bool)
func (*Decoder) EnablePartialReads(on bool)
The default behavior, even when a type implements one of the new `Stream*` interfaces, will be to operate on an entire JSON object at once. That is to say, the Encoder will internally buffer `MarshalJSONStream`'s output, and process any error before continuing, and a decoder will read an entire JSON object into a buffer, then pass it to `UnmarshalJSONStream` only if there are no errors.
However, when `EnablePartial*` is enabled, the library will bypass this internal buffering, allowing for immediate streaming to/from the source/destination.
Enabling streaming with the `EnablePartial*` toggle could be enough to already experience a benefit for many users, even without the use of the additional interfaces above.
Toggling `EnablePartial*` on will, of course, enable streaming for all types, not just those which implement the new interface above, so this could be considered a separate part of the proposal. In my opinion, this alone would be worth implementing, even if the new interface types above are done later or never.
Internals
CL 13818 can serve as very informative for this part of the discussion. I've also done some digging in the `encoding/json` package (as of 1.12) recently, for more current context. A large number of internal changes will be necessary to allow for this. I started playing around with a few internals, and I believe this is doable, but will mean a lot of code churn, so will need to be done carefully, in small steps with good code review.
As an exercise, I have successfully rewritten`indent()` to work with streams, rather than on byte slices, and began doing the same with `compact()`. The `encodeState` type would need to work with a standard `io.Writer` rather than specifically a `bytes.Buffer`. This seems to be a bigger change, but not technically difficult. I know there are other changes needed--I haven't done a complete audit of the code.
An open question is how these changes might impact performance. My benchmarks after changing `indent()` showed no change in performance, but it wasn't a particularly rigorous test.
With the internals rewritten to support streams, then it's just a matter of doing the internal buffering at the appropriate place, such as at API boundaries (i.e. in `Marshal()` and `Unmarshal()`), rather than as a bulit-in fundamental concept. Then, as described above, turning off that buffering when properly configured above.
Final comments
To be clear, I am interested in working on this. I’m not just trying to throw out a “nice to have, now would somebody do this for me?” type of proposal. But I want to make sure I fully understand the history and context of this situation before I start too far down this rabbit hole.
I'm curious to hear the opinions of others who have been around longer. Perhaps such a proposal was already discussed (and possibly rejected?) in greater length than I can find in the above linked tickets. If so, please point me to the relevant conversation(s).
I am aware of several third-party libraries that offer some support like this, but most have various drawbacks (relying on code generation, or over-complex APIs). I would love to see this kind of support in the standard library. And one last aside: CL 13818 also added support for marshaling channels. That may or may not be a good idea (my personal feeling: probably not), but that can be addressed separately.
Instead of modifying the existing Encoder/Decoder, wouldn't it be
better to do this as a separate encoder/decoder?
>
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
On Fri, Aug 9, 2019 at 8:53 AM Jonathan Hall <fli...@flimzy.com> wrote:
>
> Can you say more? Better in which way?
Better in the way that it wouldn't change existing code.
Also, I think
the use cases for existing and proposed json encoders/decoders are
different enough to justify a separate implementation.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/df688733-a2e9-4bc8-aa7b-09267827007a%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
I debated posting here, or straight to GitHub. If that's the better place, I can move the thread there. I have long wanted proper streaming support in the `encoding/json` library. Lately I’ve been doing some digging to understand the current state of things, and I think I’ve come to grips with most of it.
A number of previous issues relate to this topic: https://github.com/golang/go/issues/7872, https://github.com/golang/go/issues/11046, https://github.com/golang/go/issues/12001, https://github.com/golang/go/issues/14140
I have read through each of these issues, and believe I have a fair understanding of the problems associated with streaming JSON input/output. If I'm overlooking something please enlighten me.
In a nutshell: The library implicitly guarantees that marshaling will never write an incomplete JSON object due to an error, and that during unmarshaling, it will never pass an incomplete JSON message to `UnmarshalJSON`.
Work toward this was done about 3 years ago, on this CL: https://go-review.googlesource.com/c/go/+/13818/
Workt was eventually abandoned, apparently when the author was unsure how to make the new behavior opt-in. I believe this proposal will solve that issue.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/718e2b9e-39e2-44f2-9308-8c94e31afbff%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/718e2b9e-39e2-44f2-9308-8c94e31afbff%40googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/718e2b9e-39e2-44f2-9308-8c94e31afbff%40googlegroups.com.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/924142684.7377.1565381399326%40wamui-agami.atl.sa.earthlink.net.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/4d901ba9-5724-45cd-b7f7-24fccd9ef5db%40googlegroups.com.