I don't think a document format is appropriate for streaming results,
at least, the entire response should not be a document. For streaming,
I would want to see one result per line, with a literal \r\n between
each. Put the header with columns at the beginning and the footer at
the end with a double \r\n separating them from the data, the same way
an HTTP response separates header and body content:
{"columns": ["col1", "col2"]}\r\n
\r\n
["valA","valX"]\r\n
["valB","valY"]\r\n
["valC","valZ"]\r\n
\r\n
{"error":null,"row_count":3,"time":1.23}
An error mid-stream is the same format, but with an error indicated in
the footer, and the row count being the number of rows returned before
the error was encountered:
{"columns": ["col1", "col2"]}\r\n
\r\n
["valA","valX"]\r\n
["valB","valY"]\r\n
\r\n
{"error":{"code":123,"message":"Something bad happened"},"row_count":
2,"time":1.23}
If we're streaming over HTTP, this format takes advantage a user's
existing knowledge of how HTTP responses are formatted. It also
explicitly demarcates the header, data and footer sections of the
response; no checking if a row is an object or an array. It does not
rely on a user's language/framework of choice having a JSON parser
which can handle incomplete JSON documents, because each line is a
fully-formed JSON document. Client code is simpler because the success
case and the mid-stream error case are in the same format.
Just my thoughts.
-- Josh
On May 8, 10:26 am, Nigel Small <
ni...@nigelsmall.net> wrote:
> Why do we want to constrain ourselves to a pure JSON response? Because
> that's the way we've done it until now? We haven't had streaming results
> until now.
>
> Is a document format appropriate for streaming results? I honestly don't
> believe so: one is static by nature the other is dynamic.
>
> If we were to design the response format without any knowledge of the
> current implementation, how would we go about it?
>
> >> On 7 May 2012 09:56, Michael Hunger <
michael.hun...@neotechnology.com>wrote:
>
> >>> Great discussion,
>
> >>> thanks for all the input.
>
> >>> There is only one breaking change when it comes to streaming (and
> >>> passing stream=true), exceptions and other errors _might_ occur only after
> >>> the fact, i.e. when the first data was already streamed so they won't be
> >>> reflected in the header.
>
> >>> This is especially true for cypher, the batch-rest-api and traversals,
> >>> not so much for other calls.
>
> >>> For the batch-rest-API the commands that failed will abort the operation
> >>> and contain the status code and error messages as part of its result
> >>> payload.
>
> >>> Regarding a better streaming friendly format.
>
> >>> I would like to change the streaming cypher format into a stream of
> >>> fully formed json objects,
> >>> first. header (contains the columns and perhaps query and parameters)
> >>> then. times row (with the data, or an error object that aborts the query)
> >>> last. footer (total rows, time taken, other metadata)
>
> >>> It is what you'd get in the batch-rest API by leaving of the first and
> >>> last "[" "]".
>
> >>> This should be much easier to consume in a streaming way, see also my
> >>> test-client impl in the streaming-cypher experiment server-extension:
> >>> (
> >>>
https://github.com/neo4j-contrib/streaming-cypher/blob/master/src/mai...
> >>> )
> >>>>
https://github.com/nigelsmall/py2neo/blob/7195b2463460b8980e02dd41b65...
>
> >>>> I'm considering rebuilding a streaming JSON parser outside of the main
> >>>> code but haven't had the time so far. I would certainly prefer not only to
> >>>> be able to decode after the whole thing is received otherwise I'm missing a
> >>>> potential benefit, performance-wise.
>
> >>>> On top of this, the entire interface to Cypher execution has changed in
> >>>> py2neo 1.2. There are now callbacks in place, the main one of which is
> >>>> called each time a new row has been received from a query. This allows the
> >>>> application to begin to use the response before it has completely arrived.
> >>>> There's another callback for the metadata (currently only columns) which
> >>>> unfortunately always seems to kick off *after* the rows have been received
> >>>> since the column data follows the row data in the response. It would be
> >>>> nice to have the columns arrive first so that tabulated output could be
> >>>> produced in order (for example).
>
> >>>> Nige
>
> >>>> On 6 May 2012 23:49, Aseem Kishore <
aseem.kish...@gmail.com> wrote:
>
> >>>>> Great, thanks Josh!
>
> >>>>> You might be able to find a PHP library that can parse JSON streams. I
> >>>>> haven't used any myself, but there certainly exist several out there across
> >>>>> many platforms.
>
> >>>>> Besides searching Google, this SO question has lots of input:
> >>>>>
http://stackoverflow.com/questions/444380/is-there-a-streaming-api-fo...
>
> >>>>> Cheers,
> >>>>> Aseem
> ...
>
> read more »