Question for other Neo4j REST client library authors and people
familiar with the REST server internals:
I'm really liking the performance increase of streaming Cypher. I'd
like to make it the default (and only method) for Cypher results in
Neo4jPHP.
Unfortunately, I made some design choices in Neo4jPHP that make it
difficult to tune my request headers on a per-request basis. I did
some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
none of them seem to mind passing the "Accept: application/
json;stream=true" header to all the endpoints; it looks like they will
just ignore the "stream=true" part.
So the questions are:
1) Despite being a bit semantically incorrect, should I just pass
"stream=true" in the Accept header to every endpoint, regardless of
server version?
2) Are there any compatibility, functionality or performance concerns
I'm not seeing?
3) For library authors: do you give your users a choice to use
streaming or not? Or is that an implementation detail that the library
hides from the user?
Obviously, the question is open to anybody, but I'm particularly
interested in how other library authors handled this.
I haven't had a chance to check out streaming Cypher yet, but it's great to
hear that it's nothing but positive. =)
I'd love answers to (1) and (2) also.
I can't answer (3) myself without understanding streaming Cypher a bit
more, so here are some additional questions from my side:
(4) Is it accurate to say that streaming Cypher just means that Neo4j
returns JSON as a stream instead of all at once?
(5) Is it also accurate to say that a client library that waits for the
entire HTTP response to finish before parsing the JSON will continue to
work just fine?
If (5) is indeed the case, I don't see any reason the user should have to
be concerned with streaming Cypher; my answer to (3) then would be no,
that's an implementation detail (a perf optimization).
And just to understand this further:
(6) Can client libraries theoretically take advantage of streaming Cypher
even more by also parsing JSON results (and potentially returning them to
clients) as they stream in rather than all at the end?
If (6) is also the case, then it seems client libraries should have this as
an enhancement option -- let the developer handle result rows as they come
in rather than at the end.
On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com> wrote:
> Question for other Neo4j REST client library authors and people
> familiar with the REST server internals:
> I'm really liking the performance increase of streaming Cypher. I'd
> like to make it the default (and only method) for Cypher results in
> Neo4jPHP.
> Unfortunately, I made some design choices in Neo4jPHP that make it
> difficult to tune my request headers on a per-request basis. I did
> some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> none of them seem to mind passing the "Accept: application/
> json;stream=true" header to all the endpoints; it looks like they will
> just ignore the "stream=true" part.
> So the questions are:
> 1) Despite being a bit semantically incorrect, should I just pass
> "stream=true" in the Accept header to every endpoint, regardless of
> server version?
> 2) Are there any compatibility, functionality or performance concerns
> I'm not seeing?
> 3) For library authors: do you give your users a choice to use
> streaming or not? Or is that an implementation detail that the library
> hides from the user?
> Obviously, the question is open to anybody, but I'm particularly
> interested in how other library authors handled this.
I believe the streaming method will eventually dominate and render the
previous method redundant. So, except for reasons of backward
compatibility, I can't see any reason why there would be a need for a
slower, less flexible method in the long-term. To that end, I don't believe
a choice should be given to library users. This allows us to maintain a
consistent interface over the longer term while avoiding the need for extra
complexity that the user probably wouldn't care about, possibly wouldn't
understand and would almost certainly hardly ever need. There also seems
little harm in passing "stream=true" to every request - it's certainly my
plan to do so.
I've not found any compatibility issues either- so far, everything has been
remarkably smooth. I am however looking at some performance stats although
nothing so far has indicated that the streaming is anything but a vast
improvement :-)
Nige
On 6 May 2012 16:55, Josh Adell <josh.ad...@gmail.com> wrote:
> Question for other Neo4j REST client library authors and people
> familiar with the REST server internals:
> I'm really liking the performance increase of streaming Cypher. I'd
> like to make it the default (and only method) for Cypher results in
> Neo4jPHP.
> Unfortunately, I made some design choices in Neo4jPHP that make it
> difficult to tune my request headers on a per-request basis. I did
> some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> none of them seem to mind passing the "Accept: application/
> json;stream=true" header to all the endpoints; it looks like they will
> just ignore the "stream=true" part.
> So the questions are:
> 1) Despite being a bit semantically incorrect, should I just pass
> "stream=true" in the Accept header to every endpoint, regardless of
> server version?
> 2) Are there any compatibility, functionality or performance concerns
> I'm not seeing?
> 3) For library authors: do you give your users a choice to use
> streaming or not? Or is that an implementation detail that the library
> hides from the user?
> Obviously, the question is open to anybody, but I'm particularly
> interested in how other library authors handled this.
As far as #5 goes, that's the way Neo4jPHP currently works in my
experiments (i. e. wait for the entire stream to finish and parse the
results.) I'm not even aware of a PHP library that does true streaming
HTTP, so Neo4jPHP will continue to do that for the foreseeable
future. The performance gain is entirely on the server side, and it's
a vast improvement (returning ~10000 rows in ~1 second with streaming
and ~3.5 seconds without streaming.)
I would love to see #6 be a reality, but you really have to trust that
the server will send well-formed JSON if you want to start parsing it
before the full JSON document is received. For me, that would also
mean writing my own JSON parser, as PHP's built-in parser expects a
fully-formed document to begin with.
Anyway, great additional questions! Thanks.
-- Josh
On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> I'm the maintainer of the Node.js client library.
> I haven't had a chance to check out streaming Cypher yet, but it's great to
> hear that it's nothing but positive. =)
> I'd love answers to (1) and (2) also.
> I can't answer (3) myself without understanding streaming Cypher a bit
> more, so here are some additional questions from my side:
> (4) Is it accurate to say that streaming Cypher just means that Neo4j
> returns JSON as a stream instead of all at once?
> (5) Is it also accurate to say that a client library that waits for the
> entire HTTP response to finish before parsing the JSON will continue to
> work just fine?
> If (5) is indeed the case, I don't see any reason the user should have to
> be concerned with streaming Cypher; my answer to (3) then would be no,
> that's an implementation detail (a perf optimization).
> And just to understand this further:
> (6) Can client libraries theoretically take advantage of streaming Cypher
> even more by also parsing JSON results (and potentially returning them to
> clients) as they stream in rather than all at the end?
> If (6) is also the case, then it seems client libraries should have this as
> an enhancement option -- let the developer handle result rows as they come
> in rather than at the end.
> Thanks guys,
> Aseem
> On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com> wrote:
> > Question for other Neo4j REST client library authors and people
> > familiar with the REST server internals:
> > I'm really liking the performance increase of streaming Cypher. I'd
> > like to make it the default (and only method) for Cypher results in
> > Neo4jPHP.
> > Unfortunately, I made some design choices in Neo4jPHP that make it
> > difficult to tune my request headers on a per-request basis. I did
> > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> > none of them seem to mind passing the "Accept: application/
> > json;stream=true" header to all the endpoints; it looks like they will
> > just ignore the "stream=true" part.
> > So the questions are:
> > 1) Despite being a bit semantically incorrect, should I just pass
> > "stream=true" in the Accept header to every endpoint, regardless of
> > server version?
> > 2) Are there any compatibility, functionality or performance concerns
> > I'm not seeing?
> > 3) For library authors: do you give your users a choice to use
> > streaming or not? Or is that an implementation detail that the library
> > hides from the user?
> > Obviously, the question is open to anybody, but I'm particularly
> > interested in how other library authors handled this.
Nigel,
That's what I was leaning towards, I'm just curious how others handled
it. Does py2neo wait to receive the entire JSON document before
parsing, or is it parsing a partial JSON document as it streams in? If
the latter, did you write your own parser for that, or is there
already a Python library that parses partial JSON?
-- Josh
On May 6, 6:14 pm, Nigel Small <ni...@nigelsmall.net> wrote:
> I believe the streaming method will eventually dominate and render the
> previous method redundant. So, except for reasons of backward
> compatibility, I can't see any reason why there would be a need for a
> slower, less flexible method in the long-term. To that end, I don't believe
> a choice should be given to library users. This allows us to maintain a
> consistent interface over the longer term while avoiding the need for extra
> complexity that the user probably wouldn't care about, possibly wouldn't
> understand and would almost certainly hardly ever need. There also seems
> little harm in passing "stream=true" to every request - it's certainly my
> plan to do so.
> I've not found any compatibility issues either- so far, everything has been
> remarkably smooth. I am however looking at some performance stats although
> nothing so far has indicated that the streaming is anything but a vast
> improvement :-)
> Nige
> On 6 May 2012 16:55, Josh Adell <josh.ad...@gmail.com> wrote:
> > Question for other Neo4j REST client library authors and people
> > familiar with the REST server internals:
> > I'm really liking the performance increase of streaming Cypher. I'd
> > like to make it the default (and only method) for Cypher results in
> > Neo4jPHP.
> > Unfortunately, I made some design choices in Neo4jPHP that make it
> > difficult to tune my request headers on a per-request basis. I did
> > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> > none of them seem to mind passing the "Accept: application/
> > json;stream=true" header to all the endpoints; it looks like they will
> > just ignore the "stream=true" part.
> > So the questions are:
> > 1) Despite being a bit semantically incorrect, should I just pass
> > "stream=true" in the Accept header to every endpoint, regardless of
> > server version?
> > 2) Are there any compatibility, functionality or performance concerns
> > I'm not seeing?
> > 3) For library authors: do you give your users a choice to use
> > streaming or not? Or is that an implementation detail that the library
> > hides from the user?
> > Obviously, the question is open to anybody, but I'm particularly
> > interested in how other library authors handled this.
You might be able to find a PHP library that can parse JSON streams. I
haven't used any myself, but there certainly exist several out there across
many platforms.
On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
> Hey Aseem,
> As far as #5 goes, that's the way Neo4jPHP currently works in my
> experiments (i. e. wait for the entire stream to finish and parse the
> results.) I'm not even aware of a PHP library that does true streaming
> HTTP, so Neo4jPHP will continue to do that for the foreseeable
> future. The performance gain is entirely on the server side, and it's
> a vast improvement (returning ~10000 rows in ~1 second with streaming
> and ~3.5 seconds without streaming.)
> I would love to see #6 be a reality, but you really have to trust that
> the server will send well-formed JSON if you want to start parsing it
> before the full JSON document is received. For me, that would also
> mean writing my own JSON parser, as PHP's built-in parser expects a
> fully-formed document to begin with.
> Anyway, great additional questions! Thanks.
> -- Josh
> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> > I'm the maintainer of the Node.js client library.
> > I haven't had a chance to check out streaming Cypher yet, but it's great
> to
> > hear that it's nothing but positive. =)
> > I'd love answers to (1) and (2) also.
> > I can't answer (3) myself without understanding streaming Cypher a bit
> > more, so here are some additional questions from my side:
> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
> > returns JSON as a stream instead of all at once?
> > (5) Is it also accurate to say that a client library that waits for the
> > entire HTTP response to finish before parsing the JSON will continue to
> > work just fine?
> > If (5) is indeed the case, I don't see any reason the user should have to
> > be concerned with streaming Cypher; my answer to (3) then would be no,
> > that's an implementation detail (a perf optimization).
> > And just to understand this further:
> > (6) Can client libraries theoretically take advantage of streaming Cypher
> > even more by also parsing JSON results (and potentially returning them to
> > clients) as they stream in rather than all at the end?
> > If (6) is also the case, then it seems client libraries should have this
> as
> > an enhancement option -- let the developer handle result rows as they
> come
> > in rather than at the end.
> > Thanks guys,
> > Aseem
> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com>
> wrote:
> > > Question for other Neo4j REST client library authors and people
> > > familiar with the REST server internals:
> > > I'm really liking the performance increase of streaming Cypher. I'd
> > > like to make it the default (and only method) for Cypher results in
> > > Neo4jPHP.
> > > Unfortunately, I made some design choices in Neo4jPHP that make it
> > > difficult to tune my request headers on a per-request basis. I did
> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> > > none of them seem to mind passing the "Accept: application/
> > > json;stream=true" header to all the endpoints; it looks like they will
> > > just ignore the "stream=true" part.
> > > So the questions are:
> > > 1) Despite being a bit semantically incorrect, should I just pass
> > > "stream=true" in the Accept header to every endpoint, regardless of
> > > server version?
> > > 2) Are there any compatibility, functionality or performance concerns
> > > I'm not seeing?
> > > 3) For library authors: do you give your users a choice to use
> > > streaming or not? Or is that an implementation detail that the library
> > > hides from the user?
> > > Obviously, the question is open to anybody, but I'm particularly
> > > interested in how other library authors handled this.
Finding client libs that supported streaming was my biggest challenge. For
HTTP, I settled with tornado which allows a streaming callback, called each
time a new chunk is received. JSON was more of a problem though. I've had
to put together some code which decodes one line at a time, incrementally
building up the complete document. It's a bit messy and relies on the
content being pretty-printed but seems to do the job. The code is at:
I'm considering rebuilding a streaming JSON parser outside of the main code
but haven't had the time so far. I would certainly prefer not only to be
able to decode after the whole thing is received otherwise I'm missing a
potential benefit, performance-wise.
On top of this, the entire interface to Cypher execution has changed in
py2neo 1.2. There are now callbacks in place, the main one of which is
called each time a new row has been received from a query. This allows the
application to begin to use the response before it has completely arrived.
There's another callback for the metadata (currently only columns) which
unfortunately always seems to kick off *after* the rows have been received
since the column data follows the row data in the response. It would be
nice to have the columns arrive first so that tabulated output could be
produced in order (for example).
Nige
On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> You might be able to find a PHP library that can parse JSON streams. I
> haven't used any myself, but there certainly exist several out there across
> many platforms.
> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
>> Hey Aseem,
>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>> experiments (i. e. wait for the entire stream to finish and parse the
>> results.) I'm not even aware of a PHP library that does true streaming
>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>> future. The performance gain is entirely on the server side, and it's
>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>> and ~3.5 seconds without streaming.)
>> I would love to see #6 be a reality, but you really have to trust that
>> the server will send well-formed JSON if you want to start parsing it
>> before the full JSON document is received. For me, that would also
>> mean writing my own JSON parser, as PHP's built-in parser expects a
>> fully-formed document to begin with.
>> Anyway, great additional questions! Thanks.
>> -- Josh
>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>> > I'm the maintainer of the Node.js client library.
>> > I haven't had a chance to check out streaming Cypher yet, but it's
>> great to
>> > hear that it's nothing but positive. =)
>> > I'd love answers to (1) and (2) also.
>> > I can't answer (3) myself without understanding streaming Cypher a bit
>> > more, so here are some additional questions from my side:
>> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
>> > returns JSON as a stream instead of all at once?
>> > (5) Is it also accurate to say that a client library that waits for the
>> > entire HTTP response to finish before parsing the JSON will continue to
>> > work just fine?
>> > If (5) is indeed the case, I don't see any reason the user should have
>> to
>> > be concerned with streaming Cypher; my answer to (3) then would be no,
>> > that's an implementation detail (a perf optimization).
>> > And just to understand this further:
>> > (6) Can client libraries theoretically take advantage of streaming
>> Cypher
>> > even more by also parsing JSON results (and potentially returning them
>> to
>> > clients) as they stream in rather than all at the end?
>> > If (6) is also the case, then it seems client libraries should have
>> this as
>> > an enhancement option -- let the developer handle result rows as they
>> come
>> > in rather than at the end.
>> > Thanks guys,
>> > Aseem
>> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com>
>> wrote:
>> > > Question for other Neo4j REST client library authors and people
>> > > familiar with the REST server internals:
>> > > I'm really liking the performance increase of streaming Cypher. I'd
>> > > like to make it the default (and only method) for Cypher results in
>> > > Neo4jPHP.
>> > > Unfortunately, I made some design choices in Neo4jPHP that make it
>> > > difficult to tune my request headers on a per-request basis. I did
>> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
>> > > none of them seem to mind passing the "Accept: application/
>> > > json;stream=true" header to all the endpoints; it looks like they will
>> > > just ignore the "stream=true" part.
>> > > So the questions are:
>> > > 1) Despite being a bit semantically incorrect, should I just pass
>> > > "stream=true" in the Accept header to every endpoint, regardless of
>> > > server version?
>> > > 2) Are there any compatibility, functionality or performance concerns
>> > > I'm not seeing?
>> > > 3) For library authors: do you give your users a choice to use
>> > > streaming or not? Or is that an implementation detail that the library
>> > > hides from the user?
>> > > Obviously, the question is open to anybody, but I'm particularly
>> > > interested in how other library authors handled this.
> Finding client libs that supported streaming was my biggest challenge. For
> HTTP, I settled with tornado which allows a streaming callback, called each
> time a new chunk is received. JSON was more of a problem though. I've had
> to put together some code which decodes one line at a time, incrementally
> building up the complete document. It's a bit messy and relies on the
> content being pretty-printed but seems to do the job. The code is at:
> I'm considering rebuilding a streaming JSON parser outside of the main
> code but haven't had the time so far. I would certainly prefer not only to
> be able to decode after the whole thing is received otherwise I'm missing a
> potential benefit, performance-wise.
> On top of this, the entire interface to Cypher execution has changed in
> py2neo 1.2. There are now callbacks in place, the main one of which is
> called each time a new row has been received from a query. This allows the
> application to begin to use the response before it has completely arrived.
> There's another callback for the metadata (currently only columns) which
> unfortunately always seems to kick off *after* the rows have been received
> since the column data follows the row data in the response. It would be
> nice to have the columns arrive first so that tabulated output could be
> produced in order (for example).
> Nige
> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>> Great, thanks Josh!
>> You might be able to find a PHP library that can parse JSON streams. I
>> haven't used any myself, but there certainly exist several out there across
>> many platforms.
>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
>>> Hey Aseem,
>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>>> experiments (i. e. wait for the entire stream to finish and parse the
>>> results.) I'm not even aware of a PHP library that does true streaming
>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>>> future. The performance gain is entirely on the server side, and it's
>>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>>> and ~3.5 seconds without streaming.)
>>> I would love to see #6 be a reality, but you really have to trust that
>>> the server will send well-formed JSON if you want to start parsing it
>>> before the full JSON document is received. For me, that would also
>>> mean writing my own JSON parser, as PHP's built-in parser expects a
>>> fully-formed document to begin with.
>>> Anyway, great additional questions! Thanks.
>>> -- Josh
>>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>> > I'm the maintainer of the Node.js client library.
>>> > I haven't had a chance to check out streaming Cypher yet, but it's
>>> great to
>>> > hear that it's nothing but positive. =)
>>> > I'd love answers to (1) and (2) also.
>>> > I can't answer (3) myself without understanding streaming Cypher a bit
>>> > more, so here are some additional questions from my side:
>>> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
>>> > returns JSON as a stream instead of all at once?
>>> > (5) Is it also accurate to say that a client library that waits for the
>>> > entire HTTP response to finish before parsing the JSON will continue to
>>> > work just fine?
>>> > If (5) is indeed the case, I don't see any reason the user should have
>>> to
>>> > be concerned with streaming Cypher; my answer to (3) then would be no,
>>> > that's an implementation detail (a perf optimization).
>>> > And just to understand this further:
>>> > (6) Can client libraries theoretically take advantage of streaming
>>> Cypher
>>> > even more by also parsing JSON results (and potentially returning them
>>> to
>>> > clients) as they stream in rather than all at the end?
>>> > If (6) is also the case, then it seems client libraries should have
>>> this as
>>> > an enhancement option -- let the developer handle result rows as they
>>> come
>>> > in rather than at the end.
>>> > Thanks guys,
>>> > Aseem
>>> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com>
>>> wrote:
>>> > > Question for other Neo4j REST client library authors and people
>>> > > familiar with the REST server internals:
>>> > > I'm really liking the performance increase of streaming Cypher. I'd
>>> > > like to make it the default (and only method) for Cypher results in
>>> > > Neo4jPHP.
>>> > > Unfortunately, I made some design choices in Neo4jPHP that make it
>>> > > difficult to tune my request headers on a per-request basis. I did
>>> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
>>> > > none of them seem to mind passing the "Accept: application/
>>> > > json;stream=true" header to all the endpoints; it looks like they
>>> will
>>> > > just ignore the "stream=true" part.
>>> > > So the questions are:
>>> > > 1) Despite being a bit semantically incorrect, should I just pass
>>> > > "stream=true" in the Accept header to every endpoint, regardless of
>>> > > server version?
>>> > > 2) Are there any compatibility, functionality or performance concerns
>>> > > I'm not seeing?
>>> > > 3) For library authors: do you give your users a choice to use
>>> > > streaming or not? Or is that an implementation detail that the
>>> library
>>> > > hides from the user?
>>> > > Obviously, the question is open to anybody, but I'm particularly
>>> > > interested in how other library authors handled this.
There is only one breaking change when it comes to streaming (and passing stream=true), exceptions and other errors _might_ occur only after the fact, i.e. when the first data was already streamed so they won't be reflected in the header.
This is especially true for cypher, the batch-rest-api and traversals, not so much for other calls.
For the batch-rest-API the commands that failed will abort the operation and contain the status code and error messages as part of its result payload.
Regarding a better streaming friendly format.
I would like to change the streaming cypher format into a stream of fully formed json objects,
first. header (contains the columns and perhaps query and parameters)
then. times row (with the data, or an error object that aborts the query)
last. footer (total rows, time taken, other metadata)
It is what you'd get in the batch-rest API by leaving of the first and last "[" "]".
For the changes in the batch-rest-API (which will be merged in this week) the performance gain is:
4 sec for creating 30k nodes with streaming (and almost no memory usage)
14 sec for creating 30k nodes w/o streaming (and lots of memory used)
All these changes don't yet contain the compact format which would add another performance gain, but we're not sure yet how to request that compact format,
- either with a different URI or query parameter (different representation) - an additional or extended header field
- .... ?
- the application/json;stream=true is probably also preliminary as it is not the correct way to indicate streaming-able clients
> Nigel, > The order thing sounds like a good improvement issue. Great discussion!
> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
> Finding client libs that supported streaming was my biggest challenge. For HTTP, I settled with tornado which allows a streaming callback, called each time a new chunk is received. JSON was more of a problem though. I've had to put together some code which decodes one line at a time, incrementally building up the complete document. It's a bit messy and relies on the content being pretty-printed but seems to do the job. The code is at:
> I'm considering rebuilding a streaming JSON parser outside of the main code but haven't had the time so far. I would certainly prefer not only to be able to decode after the whole thing is received otherwise I'm missing a potential benefit, performance-wise.
> On top of this, the entire interface to Cypher execution has changed in py2neo 1.2. There are now callbacks in place, the main one of which is called each time a new row has been received from a query. This allows the application to begin to use the response before it has completely arrived. There's another callback for the metadata (currently only columns) which unfortunately always seems to kick off *after* the rows have been received since the column data follows the row data in the response. It would be nice to have the columns arrive first so that tabulated output could be produced in order (for example).
> Nige
> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> Great, thanks Josh!
> You might be able to find a PHP library that can parse JSON streams. I haven't used any myself, but there certainly exist several out there across many platforms.
> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
> Hey Aseem,
> As far as #5 goes, that's the way Neo4jPHP currently works in my
> experiments (i. e. wait for the entire stream to finish and parse the
> results.) I'm not even aware of a PHP library that does true streaming
> HTTP, so Neo4jPHP will continue to do that for the foreseeable
> future. The performance gain is entirely on the server side, and it's
> a vast improvement (returning ~10000 rows in ~1 second with streaming
> and ~3.5 seconds without streaming.)
> I would love to see #6 be a reality, but you really have to trust that
> the server will send well-formed JSON if you want to start parsing it
> before the full JSON document is received. For me, that would also
> mean writing my own JSON parser, as PHP's built-in parser expects a
> fully-formed document to begin with.
> Anyway, great additional questions! Thanks.
> -- Josh
> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> > I'm the maintainer of the Node.js client library.
> > I haven't had a chance to check out streaming Cypher yet, but it's great to
> > hear that it's nothing but positive. =)
> > I'd love answers to (1) and (2) also.
> > I can't answer (3) myself without understanding streaming Cypher a bit
> > more, so here are some additional questions from my side:
> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
> > returns JSON as a stream instead of all at once?
> > (5) Is it also accurate to say that a client library that waits for the
> > entire HTTP response to finish before parsing the JSON will continue to
> > work just fine?
> > If (5) is indeed the case, I don't see any reason the user should have to
> > be concerned with streaming Cypher; my answer to (3) then would be no,
> > that's an implementation detail (a perf optimization).
> > And just to understand this further:
> > (6) Can client libraries theoretically take advantage of streaming Cypher
> > even more by also parsing JSON results (and potentially returning them to
> > clients) as they stream in rather than all at the end?
> > If (6) is also the case, then it seems client libraries should have this as
> > an enhancement option -- let the developer handle result rows as they come
> > in rather than at the end.
> > Thanks guys,
> > Aseem
> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com> wrote:
> > > Question for other Neo4j REST client library authors and people
> > > familiar with the REST server internals:
> > > I'm really liking the performance increase of streaming Cypher. I'd
> > > like to make it the default (and only method) for Cypher results in
> > > Neo4jPHP.
> > > Unfortunately, I made some design choices in Neo4jPHP that make it
> > > difficult to tune my request headers on a per-request basis. I did
> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
> > > none of them seem to mind passing the "Accept: application/
> > > json;stream=true" header to all the endpoints; it looks like they will
> > > just ignore the "stream=true" part.
> > > So the questions are:
> > > 1) Despite being a bit semantically incorrect, should I just pass
> > > "stream=true" in the Accept header to every endpoint, regardless of
> > > server version?
> > > 2) Are there any compatibility, functionality or performance concerns
> > > I'm not seeing?
> > > 3) For library authors: do you give your users a choice to use
> > > streaming or not? Or is that an implementation detail that the library
> > > hides from the user?
> > > Obviously, the question is open to anybody, but I'm particularly
> > > interested in how other library authors handled this.
I can see the issue with errors and agree that the only way to dynamically
produce an error part way through the output would be to ensure that a
series of objects were passed instead of parts of a bigger object as it is
today. A couple of questions though:
1. Does each row (of data) need to be a JSON object? Would a JSON array not
make more sense?
2. The header row isn't an issue but how do you delimit the footer row?
There clearly cannot be a row count up-front due to the nature of the query
results but how do we know that we have the footer and not just another row
of data?
On the other hand, we could use an object for the header and footer and an
array for each data row. That could give us something like:
> There is only one breaking change when it comes to streaming (and passing
> stream=true), exceptions and other errors _might_ occur only after the
> fact, i.e. when the first data was already streamed so they won't be
> reflected in the header.
> This is especially true for cypher, the batch-rest-api and traversals, not
> so much for other calls.
> For the batch-rest-API the commands that failed will abort the operation
> and contain the status code and error messages as part of its result
> payload.
> Regarding a better streaming friendly format.
> I would like to change the streaming cypher format into a stream of fully
> formed json objects,
> first. header (contains the columns and perhaps query and parameters)
> then. times row (with the data, or an error object that aborts the query)
> last. footer (total rows, time taken, other metadata)
> It is what you'd get in the batch-rest API by leaving of the first and
> last "[" "]".
> For the changes in the batch-rest-API (which will be merged in this week)
> the performance gain is:
> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
> All these changes don't yet contain the compact format which would add
> another performance gain, but we're not sure yet how to request that
> compact format,
> - either with a different URI or query parameter (different
> representation)
> - an additional or extended header field
> - .... ?
> - the application/json;stream=true is probably also preliminary as it is
> not the correct way to indicate streaming-able clients
> Michael
> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
> Nigel,
> The order thing sounds like a good improvement issue. Great discussion!
> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
>> Finding client libs that supported streaming was my biggest challenge.
>> For HTTP, I settled with tornado which allows a streaming callback, called
>> each time a new chunk is received. JSON was more of a problem though. I've
>> had to put together some code which decodes one line at a time,
>> incrementally building up the complete document. It's a bit messy and
>> relies on the content being pretty-printed but seems to do the job. The
>> code is at:
>> I'm considering rebuilding a streaming JSON parser outside of the main
>> code but haven't had the time so far. I would certainly prefer not only to
>> be able to decode after the whole thing is received otherwise I'm missing a
>> potential benefit, performance-wise.
>> On top of this, the entire interface to Cypher execution has changed in
>> py2neo 1.2. There are now callbacks in place, the main one of which is
>> called each time a new row has been received from a query. This allows the
>> application to begin to use the response before it has completely arrived.
>> There's another callback for the metadata (currently only columns) which
>> unfortunately always seems to kick off *after* the rows have been received
>> since the column data follows the row data in the response. It would be
>> nice to have the columns arrive first so that tabulated output could be
>> produced in order (for example).
>> Nige
>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>> Great, thanks Josh!
>>> You might be able to find a PHP library that can parse JSON streams. I
>>> haven't used any myself, but there certainly exist several out there across
>>> many platforms.
>>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
>>>> Hey Aseem,
>>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>>>> experiments (i. e. wait for the entire stream to finish and parse the
>>>> results.) I'm not even aware of a PHP library that does true streaming
>>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>>>> future. The performance gain is entirely on the server side, and it's
>>>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>>>> and ~3.5 seconds without streaming.)
>>>> I would love to see #6 be a reality, but you really have to trust that
>>>> the server will send well-formed JSON if you want to start parsing it
>>>> before the full JSON document is received. For me, that would also
>>>> mean writing my own JSON parser, as PHP's built-in parser expects a
>>>> fully-formed document to begin with.
>>>> Anyway, great additional questions! Thanks.
>>>> -- Josh
>>>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>> > I'm the maintainer of the Node.js client library.
>>>> > I haven't had a chance to check out streaming Cypher yet, but it's
>>>> great to
>>>> > hear that it's nothing but positive. =)
>>>> > I'd love answers to (1) and (2) also.
>>>> > I can't answer (3) myself without understanding streaming Cypher a bit
>>>> > more, so here are some additional questions from my side:
>>>> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
>>>> > returns JSON as a stream instead of all at once?
>>>> > (5) Is it also accurate to say that a client library that waits for
>>>> the
>>>> > entire HTTP response to finish before parsing the JSON will continue
>>>> to
>>>> > work just fine?
>>>> > If (5) is indeed the case, I don't see any reason the user should
>>>> have to
>>>> > be concerned with streaming Cypher; my answer to (3) then would be no,
>>>> > that's an implementation detail (a perf optimization).
>>>> > And just to understand this further:
>>>> > (6) Can client libraries theoretically take advantage of streaming
>>>> Cypher
>>>> > even more by also parsing JSON results (and potentially returning
>>>> them to
>>>> > clients) as they stream in rather than all at the end?
>>>> > If (6) is also the case, then it seems client libraries should have
>>>> this as
>>>> > an enhancement option -- let the developer handle result rows as they
>>>> come
>>>> > in rather than at the end.
>>>> > Thanks guys,
>>>> > Aseem
>>>> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com>
>>>> wrote:
>>>> > > Question for other Neo4j REST client library authors and people
>>>> > > familiar with the REST server internals:
>>>> > > I'm really liking the performance increase of streaming Cypher. I'd
>>>> > > like to make it the default (and only method) for Cypher results in
>>>> > > Neo4jPHP.
>>>> > > Unfortunately, I made some design choices in Neo4jPHP that make it
>>>> > > difficult to tune my request headers on a per-request basis. I did
>>>> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances,
>>>> and
>>>> > > none of them seem to mind passing the "Accept: application/
>>>> > > json;stream=true" header to all the endpoints; it looks like they
>>>> will
>>>> > > just ignore the "stream=true" part.
>>>> > > So the questions are:
>>>> > > 1) Despite being a bit semantically incorrect, should I just pass
>>>> > > "stream=true" in the Accept header to every endpoint, regardless of
>>>> > > server version?
>>>> > > 2) Are there any compatibility, functionality or performance
>>>> concerns
>>>> > > I'm not seeing?
>>>> > > 3) For library authors: do you give your users a choice to use
>>>> > > streaming or not? Or is that an implementation detail that the
>>>> library
>>>> > > hides from the user?
>>>> > > Obviously, the question is open to anybody, but I'm particularly
>>>> > > interested in how other library authors handled this.
Right I implied objects or arrays or other json constructs (like strings, numbers, booleans) when saying "object"
we could know that this is the footer by:
- it being an object instead of an array
- having a dedicated key in there that specifies the type: (e.g. type: footer , similarly type:header)
- or having an "EOF" string denoting the end of the stream after the footer
> I can see the issue with errors and agree that the only way to dynamically produce an error part way through the output would be to ensure that a series of objects were passed instead of parts of a bigger object as it is today. A couple of questions though:
> 1. Does each row (of data) need to be a JSON object? Would a JSON array not make more sense?
> 2. The header row isn't an issue but how do you delimit the footer row? There clearly cannot be a row count up-front due to the nature of the query results but how do we know that we have the footer and not just another row of data?
> On the other hand, we could use an object for the header and footer and an array for each data row. That could give us something like:
> On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com> wrote:
> Great discussion,
> thanks for all the input.
> There is only one breaking change when it comes to streaming (and passing stream=true), exceptions and other errors _might_ occur only after the fact, i.e. when the first data was already streamed so they won't be reflected in the header.
> This is especially true for cypher, the batch-rest-api and traversals, not so much for other calls.
> For the batch-rest-API the commands that failed will abort the operation and contain the status code and error messages as part of its result payload.
> Regarding a better streaming friendly format.
> I would like to change the streaming cypher format into a stream of fully formed json objects,
> first. header (contains the columns and perhaps query and parameters)
> then. times row (with the data, or an error object that aborts the query)
> last. footer (total rows, time taken, other metadata)
> It is what you'd get in the batch-rest API by leaving of the first and last "[" "]".
> For the changes in the batch-rest-API (which will be merged in this week) the performance gain is:
> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
> All these changes don't yet contain the compact format which would add another performance gain, but we're not sure yet how to request that compact format,
> - either with a different URI or query parameter (different representation) > - an additional or extended header field
> - .... ?
> - the application/json;stream=true is probably also preliminary as it is not the correct way to indicate streaming-able clients
> Michael
> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
>> Nigel, >> The order thing sounds like a good improvement issue. Great discussion!
>> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
>> Finding client libs that supported streaming was my biggest challenge. For HTTP, I settled with tornado which allows a streaming callback, called each time a new chunk is received. JSON was more of a problem though. I've had to put together some code which decodes one line at a time, incrementally building up the complete document. It's a bit messy and relies on the content being pretty-printed but seems to do the job. The code is at:
>> I'm considering rebuilding a streaming JSON parser outside of the main code but haven't had the time so far. I would certainly prefer not only to be able to decode after the whole thing is received otherwise I'm missing a potential benefit, performance-wise.
>> On top of this, the entire interface to Cypher execution has changed in py2neo 1.2. There are now callbacks in place, the main one of which is called each time a new row has been received from a query. This allows the application to begin to use the response before it has completely arrived. There's another callback for the metadata (currently only columns) which unfortunately always seems to kick off *after* the rows have been received since the column data follows the row data in the response. It would be nice to have the columns arrive first so that tabulated output could be produced in order (for example).
>> Nige
>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>> Great, thanks Josh!
>> You might be able to find a PHP library that can parse JSON streams. I haven't used any myself, but there certainly exist several out there across many platforms.
>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com> wrote:
>> Hey Aseem,
>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>> experiments (i. e. wait for the entire stream to finish and parse the
>> results.) I'm not even aware of a PHP library that does true streaming
>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>> future. The performance gain is entirely on the server side, and it's
>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>> and ~3.5 seconds without streaming.)
>> I would love to see #6 be a reality, but you really have to trust that
>> the server will send well-formed JSON if you want to start parsing it
>> before the full JSON document is received. For me, that would also
>> mean writing my own JSON parser, as PHP's built-in parser expects a
>> fully-formed document to begin with.
>> Anyway, great additional questions! Thanks.
>> -- Josh
>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>> > I'm the maintainer of the Node.js client library.
>> > I haven't had a chance to check out streaming Cypher yet, but it's great to
>> > hear that it's nothing but positive. =)
>> > I'd love answers to (1) and (2) also.
>> > I can't answer (3) myself without understanding streaming Cypher a bit
>> > more, so here are some additional questions from my side:
>> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
>> > returns JSON as a stream instead of all at once?
>> > (5) Is it also accurate to say that a client library that waits for the
>> > entire HTTP response to finish before parsing the JSON will continue to
>> > work just fine?
>> > If (5) is indeed the case, I don't see any reason the user should have to
>> > be concerned with streaming Cypher; my answer to (3) then would be no,
>> > that's an implementation detail (a perf optimization).
>> > And just to understand this further:
>> > (6) Can client libraries theoretically take advantage of streaming Cypher
>> > even more by also parsing JSON results (and potentially returning them to
>> > clients) as they stream in rather than all at the end?
>> > If (6) is also the case, then it seems client libraries should have this as
>> > an enhancement option -- let the developer handle result rows as they come
>> > in rather than at the end.
>> > Thanks guys,
>> > Aseem
>> > On Sun, May 6, 2012 at 11:55 AM, Josh Adell <josh.ad...@gmail.com> wrote:
>> > > Question for other Neo4j REST client library authors and people
>> > > familiar with the REST server internals:
>> > > I'm really liking the performance increase of streaming Cypher. I'd
>> > > like to make it the default (and only method) for Cypher results in
>> > > Neo4jPHP.
>> > > Unfortunately, I made some design choices in Neo4jPHP that make it
>> > > difficult to tune my request headers on a per-request basis. I did
>> > > some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
>> > > none of them seem to mind passing the "Accept: application/
>> > > json;stream=true" header to all the endpoints; it looks like they will
>> > > just ignore the "stream=true" part.
>> > > So the questions are:
>> > > 1) Despite being a bit semantically incorrect, should I just pass
>> > > "stream=true" in the Accept header to every endpoint, regardless of
>> > > server version?
>> > > 2) Are there any compatibility, functionality or performance concerns
>> > > I'm not seeing?
>> > > 3) For library authors: do you give your users a choice to use
>> > > streaming or not? Or is that an implementation detail that the library
>> > > hides from the user?
>> > > Obviously, the question is open to anybody, but I'm particularly
>> > > interested in how other library authors handled this.
Sorry, haven't caught up on the full thread, just the last part, but why
won't the current JSON format work just fine? It delimits "header" info
(e.g. columns) from the rows (called "data"), so you could just add further
keys for "footer" info (e.g. errors, time, etc.).
{
columns: [ ... ],
data: [ ... ],
error: ...
}
Where the data is still a streaming array of arrays.
One important thing IMHO would be for the entire response to still be valid
JSON. What do you guys think -- do you agree with that goal?
If commas separating the rows is a concern, you can easily address it w/
Isaac Schleuter's comma-first style when streaming the JSON back. E.g.
here's what the rows would look like:
data:
// first row processed...
[ [...]
// then second row...
, [...]
// then third row...
, [...]
// no more rows left
]
michael.hun...@neotechnology.com> wrote:
> Right I implied objects or arrays or other json constructs (like strings,
> numbers, booleans) when saying "object"
> we could know that this is the footer by:
> - it being an object instead of an array
> - having a dedicated key in there that specifies the type: (e.g. type:
> footer , similarly type:header)
> - or having an "EOF" string denoting the end of the stream after the footer
> Michael
> Am 07.05.2012 um 17:44 schrieb Nigel Small:
> I can see the issue with errors and agree that the only way to dynamically
> produce an error part way through the output would be to ensure that a
> series of objects were passed instead of parts of a bigger object as it is
> today. A couple of questions though:
> 1. Does each row (of data) need to be a JSON object? Would a JSON array
> not make more sense?
> 2. The header row isn't an issue but how do you delimit the footer row?
> There clearly cannot be a row count up-front due to the nature of the query
> results but how do we know that we have the footer and not just another row
> of data?
> On the other hand, we could use an object for the header and footer and an
> array for each data row. That could give us something like:
> On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com>wrote:
>> Great discussion,
>> thanks for all the input.
>> There is only one breaking change when it comes to streaming (and passing
>> stream=true), exceptions and other errors _might_ occur only after the
>> fact, i.e. when the first data was already streamed so they won't be
>> reflected in the header.
>> This is especially true for cypher, the batch-rest-api and traversals,
>> not so much for other calls.
>> For the batch-rest-API the commands that failed will abort the operation
>> and contain the status code and error messages as part of its result
>> payload.
>> Regarding a better streaming friendly format.
>> I would like to change the streaming cypher format into a stream of fully
>> formed json objects,
>> first. header (contains the columns and perhaps query and parameters)
>> then. times row (with the data, or an error object that aborts the query)
>> last. footer (total rows, time taken, other metadata)
>> It is what you'd get in the batch-rest API by leaving of the first and
>> last "[" "]".
>> For the changes in the batch-rest-API (which will be merged in this week)
>> the performance gain is:
>> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
>> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
>> All these changes don't yet contain the compact format which would add
>> another performance gain, but we're not sure yet how to request that
>> compact format,
>> - either with a different URI or query parameter (different
>> representation)
>> - an additional or extended header field
>> - .... ?
>> - the application/json;stream=true is probably also preliminary as it is
>> not the correct way to indicate streaming-able clients
>> Michael
>> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
>> Nigel,
>> The order thing sounds like a good improvement issue. Great discussion!
>> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
>>> Finding client libs that supported streaming was my biggest challenge.
>>> For HTTP, I settled with tornado which allows a streaming callback, called
>>> each time a new chunk is received. JSON was more of a problem though. I've
>>> had to put together some code which decodes one line at a time,
>>> incrementally building up the complete document. It's a bit messy and
>>> relies on the content being pretty-printed but seems to do the job. The
>>> code is at:
>>> I'm considering rebuilding a streaming JSON parser outside of the main
>>> code but haven't had the time so far. I would certainly prefer not only to
>>> be able to decode after the whole thing is received otherwise I'm missing a
>>> potential benefit, performance-wise.
>>> On top of this, the entire interface to Cypher execution has changed in
>>> py2neo 1.2. There are now callbacks in place, the main one of which is
>>> called each time a new row has been received from a query. This allows the
>>> application to begin to use the response before it has completely arrived.
>>> There's another callback for the metadata (currently only columns) which
>>> unfortunately always seems to kick off *after* the rows have been received
>>> since the column data follows the row data in the response. It would be
>>> nice to have the columns arrive first so that tabulated output could be
>>> produced in order (for example).
>>> Nige
>>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>> Great, thanks Josh!
>>>> You might be able to find a PHP library that can parse JSON streams. I
>>>> haven't used any myself, but there certainly exist several out there across
>>>> many platforms.
>>>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com>wrote:
>>>>> Hey Aseem,
>>>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>>>>> experiments (i. e. wait for the entire stream to finish and parse the
>>>>> results.) I'm not even aware of a PHP library that does true streaming
>>>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>>>>> future. The performance gain is entirely on the server side, and it's
>>>>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>>>>> and ~3.5 seconds without streaming.)
>>>>> I would love to see #6 be a reality, but you really have to trust that
>>>>> the server will send well-formed JSON if you want to start parsing it
>>>>> before the full JSON document is received. For me, that would also
>>>>> mean writing my own JSON parser, as PHP's built-in parser expects a
>>>>> fully-formed document to begin with.
>>>>> Anyway, great additional questions! Thanks.
>>>>> -- Josh
>>>>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>>> > I'm the maintainer of the Node.js client library.
>>>>> > I haven't had a chance to check out streaming Cypher yet, but it's
>>>>> great to
>>>>> > hear that it's nothing but positive. =)
>>>>> > I'd love answers to (1) and (2) also.
>>>>> > I can't answer (3) myself without understanding streaming Cypher a
>>>>> bit
>>>>> > more, so here are some additional questions from my side:
>>>>> > (4) Is it accurate to say that streaming Cypher just means that Neo4j
>>>>> > returns JSON as a stream instead of all at once?
>>>>> > (5) Is it also accurate to say that a client library that waits for
>>>>> the
>>>>> > entire HTTP response to finish before parsing the JSON will continue
>>>>> to
>>>>> > work just fine?
>>>>> > If (5) is indeed the case, I don't see any reason the user should
>>>>> have to
>>>>> > be concerned with streaming Cypher; my answer to (3) then would be
>>>>> no,
>>>>> > that's an implementation detail (a perf optimization).
>>>>> > And just to understand this further:
>>>>> > (6) Can client libraries theoretically take advantage of streaming
>>>>> Cypher
>>>>> > even more by also parsing JSON results (and potentially returning
>>>>> them to
>>>>> > clients) as they stream in rather than all at the end?
>>>>> > If (6) is also the case, then it seems client libraries should have
>>>>> this as
>>>>> > an enhancement option -- let the developer handle result rows as
>>>>> they come
>>>>> > in rather than at the end.
Why do we want to constrain ourselves to a pure JSON response? Because
that's the way we've done it until now? We haven't had streaming results
until now.
Is a document format appropriate for streaming results? I honestly don't
believe so: one is static by nature the other is dynamic.
If we were to design the response format without any knowledge of the
current implementation, how would we go about it?
On 8 May 2012 15:04, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> Sorry, haven't caught up on the full thread, just the last part, but why
> won't the current JSON format work just fine? It delimits "header" info
> (e.g. columns) from the rows (called "data"), so you could just add further
> keys for "footer" info (e.g. errors, time, etc.).
> Where the data is still a streaming array of arrays.
> One important thing IMHO would be for the entire response to still be
> valid JSON. What do you guys think -- do you agree with that goal?
> If commas separating the rows is a concern, you can easily address it w/
> Isaac Schleuter's comma-first style when streaming the JSON back. E.g.
> here's what the rows would look like:
> data:
> // first row processed...
> [ [...]
> // then second row...
> , [...]
> // then third row...
> , [...]
> // no more rows left
> ]
> Cheers,
> Aseem
> On Mon, May 7, 2012 at 12:22 PM, Michael Hunger <
> michael.hun...@neotechnology.com> wrote:
>> Right I implied objects or arrays or other json constructs (like strings,
>> numbers, booleans) when saying "object"
>> we could know that this is the footer by:
>> - it being an object instead of an array
>> - having a dedicated key in there that specifies the type: (e.g. type:
>> footer , similarly type:header)
>> - or having an "EOF" string denoting the end of the stream after the
>> footer
>> Michael
>> Am 07.05.2012 um 17:44 schrieb Nigel Small:
>> I can see the issue with errors and agree that the only way to
>> dynamically produce an error part way through the output would be to ensure
>> that a series of objects were passed instead of parts of a bigger object as
>> it is today. A couple of questions though:
>> 1. Does each row (of data) need to be a JSON object? Would a JSON array
>> not make more sense?
>> 2. The header row isn't an issue but how do you delimit the footer row?
>> There clearly cannot be a row count up-front due to the nature of the query
>> results but how do we know that we have the footer and not just another row
>> of data?
>> On the other hand, we could use an object for the header and footer and
>> an array for each data row. That could give us something like:
>> On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com>wrote:
>>> Great discussion,
>>> thanks for all the input.
>>> There is only one breaking change when it comes to streaming (and
>>> passing stream=true), exceptions and other errors _might_ occur only after
>>> the fact, i.e. when the first data was already streamed so they won't be
>>> reflected in the header.
>>> This is especially true for cypher, the batch-rest-api and traversals,
>>> not so much for other calls.
>>> For the batch-rest-API the commands that failed will abort the operation
>>> and contain the status code and error messages as part of its result
>>> payload.
>>> Regarding a better streaming friendly format.
>>> I would like to change the streaming cypher format into a stream of
>>> fully formed json objects,
>>> first. header (contains the columns and perhaps query and parameters)
>>> then. times row (with the data, or an error object that aborts the query)
>>> last. footer (total rows, time taken, other metadata)
>>> It is what you'd get in the batch-rest API by leaving of the first and
>>> last "[" "]".
>>> For the changes in the batch-rest-API (which will be merged in this
>>> week) the performance gain is:
>>> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
>>> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
>>> All these changes don't yet contain the compact format which would add
>>> another performance gain, but we're not sure yet how to request that
>>> compact format,
>>> - either with a different URI or query parameter (different
>>> representation)
>>> - an additional or extended header field
>>> - .... ?
>>> - the application/json;stream=true is probably also preliminary as it is
>>> not the correct way to indicate streaming-able clients
>>> Michael
>>> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
>>> Nigel,
>>> The order thing sounds like a good improvement issue. Great discussion!
>>> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
>>>> Finding client libs that supported streaming was my biggest challenge.
>>>> For HTTP, I settled with tornado which allows a streaming callback, called
>>>> each time a new chunk is received. JSON was more of a problem though. I've
>>>> had to put together some code which decodes one line at a time,
>>>> incrementally building up the complete document. It's a bit messy and
>>>> relies on the content being pretty-printed but seems to do the job. The
>>>> code is at:
>>>> I'm considering rebuilding a streaming JSON parser outside of the main
>>>> code but haven't had the time so far. I would certainly prefer not only to
>>>> be able to decode after the whole thing is received otherwise I'm missing a
>>>> potential benefit, performance-wise.
>>>> On top of this, the entire interface to Cypher execution has changed in
>>>> py2neo 1.2. There are now callbacks in place, the main one of which is
>>>> called each time a new row has been received from a query. This allows the
>>>> application to begin to use the response before it has completely arrived.
>>>> There's another callback for the metadata (currently only columns) which
>>>> unfortunately always seems to kick off *after* the rows have been received
>>>> since the column data follows the row data in the response. It would be
>>>> nice to have the columns arrive first so that tabulated output could be
>>>> produced in order (for example).
>>>> Nige
>>>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>>> Great, thanks Josh!
>>>>> You might be able to find a PHP library that can parse JSON streams. I
>>>>> haven't used any myself, but there certainly exist several out there across
>>>>> many platforms.
>>>>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com>wrote:
>>>>>> Hey Aseem,
>>>>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>>>>>> experiments (i. e. wait for the entire stream to finish and parse the
>>>>>> results.) I'm not even aware of a PHP library that does true streaming
>>>>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>>>>>> future. The performance gain is entirely on the server side, and it's
>>>>>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>>>>>> and ~3.5 seconds without streaming.)
>>>>>> I would love to see #6 be a reality, but you really have to trust that
>>>>>> the server will send well-formed JSON if you want to start parsing it
>>>>>> before the full JSON document is received. For me, that would also
>>>>>> mean writing my own JSON parser, as PHP's built-in parser expects a
>>>>>> fully-formed document to begin with.
>>>>>> Anyway, great additional questions! Thanks.
>>>>>> -- Josh
>>>>>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>>>> > I'm the maintainer of the Node.js client library.
>>>>>> > I haven't had a chance to check out streaming Cypher yet, but it's
>>>>>> great to
>>>>>> > hear that it's nothing but positive. =)
>>>>>> > I'd love answers to (1) and (2) also.
>>>>>> > I can't answer (3) myself without understanding streaming Cypher a
>>>>>> bit
>>>>>> > more, so here are some additional questions from my side:
>>>>>> > (4) Is it accurate to say that streaming Cypher just means that
>>>>>> Neo4j
>>>>>> > returns JSON as a stream instead of all at once?
>>>>>> > (5) Is it also accurate to say that a client library that waits for
>>>>>> the
>>>>>> > entire HTTP response to finish before parsing the JSON will
>>>>>> continue to
>>>>>> > work just fine?
>>>>>> > If (5) is indeed the case, I don't see any reason the user should
>>>>>> have to
>>>>>> > be concerned with streaming Cypher; my
Nah, I was just saying it because that way it works with existing tools.
Ideally, the content-type doesn't even need changing -- as far as HTTP is
concerned, it doesn't matter whether the response is streamed or sent all
at once.
I also say this because the bottleneck wasn't client libraries parsing one
chunk of JSON, it was Neo4j building up all the results in memory before
serializing them to JSON. That's fixed; it doesn't harm Neo4j to send back
valid JSON still.
Not a big deal, just something I think could be worth maintaining since it
doesn't need to have a high cost.
On Tue, May 8, 2012 at 10:26 AM, Nigel Small <ni...@nigelsmall.net> wrote:
> Why do we want to constrain ourselves to a pure JSON response? Because
> that's the way we've done it until now? We haven't had streaming results
> until now.
> Is a document format appropriate for streaming results? I honestly don't
> believe so: one is static by nature the other is dynamic.
> If we were to design the response format without any knowledge of the
> current implementation, how would we go about it?
> On 8 May 2012 15:04, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>> Sorry, haven't caught up on the full thread, just the last part, but why
>> won't the current JSON format work just fine? It delimits "header" info
>> (e.g. columns) from the rows (called "data"), so you could just add further
>> keys for "footer" info (e.g. errors, time, etc.).
>> Where the data is still a streaming array of arrays.
>> One important thing IMHO would be for the entire response to still be
>> valid JSON. What do you guys think -- do you agree with that goal?
>> If commas separating the rows is a concern, you can easily address it w/
>> Isaac Schleuter's comma-first style when streaming the JSON back. E.g.
>> here's what the rows would look like:
>> data:
>> // first row processed...
>> [ [...]
>> // then second row...
>> , [...]
>> // then third row...
>> , [...]
>> // no more rows left
>> ]
>> Cheers,
>> Aseem
>> On Mon, May 7, 2012 at 12:22 PM, Michael Hunger <
>> michael.hun...@neotechnology.com> wrote:
>>> Right I implied objects or arrays or other json constructs (like
>>> strings, numbers, booleans) when saying "object"
>>> we could know that this is the footer by:
>>> - it being an object instead of an array
>>> - having a dedicated key in there that specifies the type: (e.g. type:
>>> footer , similarly type:header)
>>> - or having an "EOF" string denoting the end of the stream after the
>>> footer
>>> Michael
>>> Am 07.05.2012 um 17:44 schrieb Nigel Small:
>>> I can see the issue with errors and agree that the only way to
>>> dynamically produce an error part way through the output would be to ensure
>>> that a series of objects were passed instead of parts of a bigger object as
>>> it is today. A couple of questions though:
>>> 1. Does each row (of data) need to be a JSON object? Would a JSON array
>>> not make more sense?
>>> 2. The header row isn't an issue but how do you delimit the footer row?
>>> There clearly cannot be a row count up-front due to the nature of the query
>>> results but how do we know that we have the footer and not just another row
>>> of data?
>>> On the other hand, we could use an object for the header and footer and
>>> an array for each data row. That could give us something like:
>>> On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com>wrote:
>>>> Great discussion,
>>>> thanks for all the input.
>>>> There is only one breaking change when it comes to streaming (and
>>>> passing stream=true), exceptions and other errors _might_ occur only after
>>>> the fact, i.e. when the first data was already streamed so they won't be
>>>> reflected in the header.
>>>> This is especially true for cypher, the batch-rest-api and traversals,
>>>> not so much for other calls.
>>>> For the batch-rest-API the commands that failed will abort the
>>>> operation and contain the status code and error messages as part of its
>>>> result payload.
>>>> Regarding a better streaming friendly format.
>>>> I would like to change the streaming cypher format into a stream of
>>>> fully formed json objects,
>>>> first. header (contains the columns and perhaps query and parameters)
>>>> then. times row (with the data, or an error object that aborts the
>>>> query)
>>>> last. footer (total rows, time taken, other metadata)
>>>> It is what you'd get in the batch-rest API by leaving of the first and
>>>> last "[" "]".
>>>> For the changes in the batch-rest-API (which will be merged in this
>>>> week) the performance gain is:
>>>> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
>>>> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
>>>> All these changes don't yet contain the compact format which would add
>>>> another performance gain, but we're not sure yet how to request that
>>>> compact format,
>>>> - either with a different URI or query parameter (different
>>>> representation)
>>>> - an additional or extended header field
>>>> - .... ?
>>>> - the application/json;stream=true is probably also preliminary as it
>>>> is not the correct way to indicate streaming-able clients
>>>> Michael
>>>> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
>>>> Nigel,
>>>> The order thing sounds like a good improvement issue. Great discussion!
>>>> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
>>>>> Finding client libs that supported streaming was my biggest challenge.
>>>>> For HTTP, I settled with tornado which allows a streaming callback, called
>>>>> each time a new chunk is received. JSON was more of a problem though. I've
>>>>> had to put together some code which decodes one line at a time,
>>>>> incrementally building up the complete document. It's a bit messy and
>>>>> relies on the content being pretty-printed but seems to do the job. The
>>>>> code is at:
>>>>> I'm considering rebuilding a streaming JSON parser outside of the main
>>>>> code but haven't had the time so far. I would certainly prefer not only to
>>>>> be able to decode after the whole thing is received otherwise I'm missing a
>>>>> potential benefit, performance-wise.
>>>>> On top of this, the entire interface to Cypher execution has changed
>>>>> in py2neo 1.2. There are now callbacks in place, the main one of which is
>>>>> called each time a new row has been received from a query. This allows the
>>>>> application to begin to use the response before it has completely arrived.
>>>>> There's another callback for the metadata (currently only columns) which
>>>>> unfortunately always seems to kick off *after* the rows have been received
>>>>> since the column data follows the row data in the response. It would be
>>>>> nice to have the columns arrive first so that tabulated output could be
>>>>> produced in order (for example).
>>>>> Nige
>>>>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>>>> Great, thanks Josh!
>>>>>> You might be able to find a PHP library that can parse JSON streams.
>>>>>> I haven't used any myself, but there certainly exist several out there
>>>>>> across many platforms.
>>>>>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com>wrote:
>>>>>>> Hey Aseem,
>>>>>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
>>>>>>> experiments (i. e. wait for the entire stream to finish and parse the
>>>>>>> results.) I'm not even aware of a PHP library that does true
>>>>>>> streaming
>>>>>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
>>>>>>> future. The performance gain is entirely on the server side, and it's
>>>>>>> a vast improvement (returning ~10000 rows in ~1 second with streaming
>>>>>>> and ~3.5 seconds without streaming.)
>>>>>>> I would love to see #6 be a reality, but you really have to trust
>>>>>>> that
>>>>>>> the server will send well-formed JSON if you want to start parsing it
>>>>>>> before the full JSON document is received. For me, that would also
>>>>>>> mean writing my own JSON parser, as PHP's built-in parser expects a
>>>>>>> fully-formed document to begin with.
>>>>>>> Anyway, great additional questions! Thanks.
>>>>>>> -- Josh
>>>>>>> On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:
>>>>>>> > I'm the
I don't think a document format is appropriate for streaming results,
at least, the entire response should not be a document. For streaming,
I would want to see one result per line, with a literal \r\n between
each. Put the header with columns at the beginning and the footer at
the end with a double \r\n separating them from the data, the same way
an HTTP response separates header and body content:
An error mid-stream is the same format, but with an error indicated in
the footer, and the row count being the number of rows returned before
the error was encountered:
If we're streaming over HTTP, this format takes advantage a user's
existing knowledge of how HTTP responses are formatted. It also
explicitly demarcates the header, data and footer sections of the
response; no checking if a row is an object or an array. It does not
rely on a user's language/framework of choice having a JSON parser
which can handle incomplete JSON documents, because each line is a
fully-formed JSON document. Client code is simpler because the success
case and the mid-stream error case are in the same format.
Just my thoughts.
-- Josh
On May 8, 10:26 am, Nigel Small <ni...@nigelsmall.net> wrote:
> Why do we want to constrain ourselves to a pure JSON response? Because
> that's the way we've done it until now? We haven't had streaming results
> until now.
> Is a document format appropriate for streaming results? I honestly don't
> believe so: one is static by nature the other is dynamic.
> If we were to design the response format without any knowledge of the
> current implementation, how would we go about it?
> On 8 May 2012 15:04, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> > Sorry, haven't caught up on the full thread, just the last part, but why
> > won't the current JSON format work just fine? It delimits "header" info
> > (e.g. columns) from the rows (called "data"), so you could just add further
> > keys for "footer" info (e.g. errors, time, etc.).
> > Where the data is still a streaming array of arrays.
> > One important thing IMHO would be for the entire response to still be
> > valid JSON. What do you guys think -- do you agree with that goal?
> > If commas separating the rows is a concern, you can easily address it w/
> > Isaac Schleuter's comma-first style when streaming the JSON back. E.g.
> > here's what the rows would look like:
> > data:
> > // first row processed...
> > [ [...]
> > // then second row...
> > , [...]
> > // then third row...
> > , [...]
> > // no more rows left
> > ]
> > Cheers,
> > Aseem
> > On Mon, May 7, 2012 at 12:22 PM, Michael Hunger <
> > michael.hun...@neotechnology.com> wrote:
> >> Right I implied objects or arrays or other json constructs (like strings,
> >> numbers, booleans) when saying "object"
> >> we could know that this is the footer by:
> >> - it being an object instead of an array
> >> - having a dedicated key in there that specifies the type: (e.g. type:
> >> footer , similarly type:header)
> >> - or having an "EOF" string denoting the end of the stream after the
> >> footer
> >> Michael
> >> Am 07.05.2012 um 17:44 schrieb Nigel Small:
> >> I can see the issue with errors and agree that the only way to
> >> dynamically produce an error part way through the output would be to ensure
> >> that a series of objects were passed instead of parts of a bigger object as
> >> it is today. A couple of questions though:
> >> 1. Does each row (of data) need to be a JSON object? Would a JSON array
> >> not make more sense?
> >> 2. The header row isn't an issue but how do you delimit the footer row?
> >> There clearly cannot be a row count up-front due to the nature of the query
> >> results but how do we know that we have the footer and not just another row
> >> of data?
> >> On the other hand, we could use an object for the header and footer and
> >> an array for each data row. That could give us something like:
> >> On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com>wrote:
> >>> Great discussion,
> >>> thanks for all the input.
> >>> There is only one breaking change when it comes to streaming (and
> >>> passing stream=true), exceptions and other errors _might_ occur only after
> >>> the fact, i.e. when the first data was already streamed so they won't be
> >>> reflected in the header.
> >>> This is especially true for cypher, the batch-rest-api and traversals,
> >>> not so much for other calls.
> >>> For the batch-rest-API the commands that failed will abort the operation
> >>> and contain the status code and error messages as part of its result
> >>> payload.
> >>> Regarding a better streaming friendly format.
> >>> I would like to change the streaming cypher format into a stream of
> >>> fully formed json objects,
> >>> first. header (contains the columns and perhaps query and parameters)
> >>> then. times row (with the data, or an error object that aborts the query)
> >>> last. footer (total rows, time taken, other metadata)
> >>> It is what you'd get in the batch-rest API by leaving of the first and
> >>> last "[" "]".
> >>> This should be much easier to consume in a streaming way, see also my
> >>> test-client impl in the streaming-cypher experiment server-extension:
> >>> (
> >>>https://github.com/neo4j-contrib/streaming-cypher/blob/master/src/mai...
> >>> )
> >>> esp. the callback interface.
> >>> For the changes in the batch-rest-API (which will be merged in this
> >>> week) the performance gain is:
> >>> 4 sec for creating 30k nodes with streaming (and almost no memory usage)
> >>> 14 sec for creating 30k nodes w/o streaming (and lots of memory used)
> >>> All these changes don't yet contain the compact format which would add
> >>> another performance gain, but we're not sure yet how to request that
> >>> compact format,
> >>> - either with a different URI or query parameter (different
> >>> representation)
> >>> - an additional or extended header field
> >>> - .... ?
> >>> - the application/json;stream=true is probably also preliminary as it is
> >>> not the correct way to indicate streaming-able clients
> >>> Michael
> >>> Am 07.05.2012 um 07:27 schrieb Peter Neubauer:
> >>> Nigel,
> >>> The order thing sounds like a good improvement issue. Great discussion!
> >>> On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:
> >>>> Finding client libs that supported streaming was my biggest challenge.
> >>>> For HTTP, I settled with tornado which allows a streaming callback, called
> >>>> each time a new chunk is received. JSON was more of a problem though. I've
> >>>> had to put together some code which decodes one line at a time,
> >>>> incrementally building up the complete document. It's a bit messy and
> >>>> relies on the content being pretty-printed but seems to do the job. The
> >>>> code is at:
> >>>> I'm considering rebuilding a streaming JSON parser outside of the main
> >>>> code but haven't had the time so far. I would certainly prefer not only to
> >>>> be able to decode after the whole thing is received otherwise I'm missing a
> >>>> potential benefit, performance-wise.
> >>>> On top of this, the entire interface to Cypher execution has changed in
> >>>> py2neo 1.2. There are now callbacks in place, the main one of which is
> >>>> called each time a new row has been received from a query. This allows the
> >>>> application to begin to use the response before it has completely arrived.
> >>>> There's another callback for the metadata (currently only columns) which
> >>>> unfortunately always seems to kick off *after* the rows have been received
> >>>> since the column data follows the row data in the response. It would be
> >>>> nice to have the columns arrive first so that tabulated output could be
> >>>> produced in order (for example).
> >>>> Nige
> >>>> On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:
> >>>>> Great, thanks Josh!
> >>>>> You might be able to find a PHP library that can parse JSON streams. I
> >>>>> haven't used any myself, but there certainly exist several out there across
> >>>>> many platforms.
> >>>>> On Sun, May 6, 2012 at 6:35 PM, Josh Adell <josh.ad...@gmail.com>wrote:
> >>>>>> Hey Aseem,
> >>>>>> As far as #5 goes, that's the way Neo4jPHP currently works in my
> >>>>>> experiments (i. e. wait for the entire stream to finish and parse the
> >>>>>> results.) I'm not even aware of a PHP library that does true streaming
> >>>>>> HTTP, so Neo4jPHP will continue to do that for the foreseeable
> >>>>>> future. The
The blank lines are an excellent idea - consistent with HTTP and no
requirement to sniff the type of line being read. We would probably be best
assigning this a content-type which explicitly needed "Accept"ing ...
"application/vnd.neo.cypher-results" or something like that. No reason that
the existing JSON format couldn't remain the default.
At the risk of bikeshedding: "\r\n", "\r" or "\n"?
On 8 May 2012 15:52, Josh Adell <josh.ad...@gmail.com> wrote:
> I don't think a document format is appropriate for streaming results,
> at least, the entire response should not be a document. For streaming,
> I would want to see one result per line, with a literal \r\n between
> each. Put the header with columns at the beginning and the footer at
> the end with a double \r\n separating them from the data, the same way
> an HTTP response separates header and body content:
> An error mid-stream is the same format, but with an error indicated in
> the footer, and the row count being the number of rows returned before
> the error was encountered:
> If we're streaming over HTTP, this format takes advantage a user's
> existing knowledge of how HTTP responses are formatted. It also
> explicitly demarcates the header, data and footer sections of the
> response; no checking if a row is an object or an array. It does not
> rely on a user's language/framework of choice having a JSON parser
> which can handle incomplete JSON documents, because each line is a
> fully-formed JSON document. Client code is simpler because the success
> case and the mid-stream error case are in the same format.
Custom Accept and Content-Type headers are probably a good idea. As
for newline; HTTP 1.1 spec (http://www.w3.org/Protocols/rfc2616/ rfc2616-sec2.html#sec2) mandates CRLF for every element, except the
entity body. My vote is for CRLF within and between the response
sections, for consistency, but either CRLF or just LF are acceptable.
-- Josh
On May 8, 11:16 am, Nigel Small <ni...@nigelsmall.net> wrote:
> The blank lines are an excellent idea - consistent with HTTP and no
> requirement to sniff the type of line being read. We would probably be best
> assigning this a content-type which explicitly needed "Accept"ing ...
> "application/vnd.neo.cypher-results" or something like that. No reason that
> the existing JSON format couldn't remain the default.
> At the risk of bikeshedding: "\r\n", "\r" or "\n"?
> On 8 May 2012 15:52, Josh Adell <josh.ad...@gmail.com> wrote:
> > I don't think a document format is appropriate for streaming results,
> > at least, the entire response should not be a document. For streaming,
> > I would want to see one result per line, with a literal \r\n between
> > each. Put the header with columns at the beginning and the footer at
> > the end with a double \r\n separating them from the data, the same way
> > an HTTP response separates header and body content:
> > An error mid-stream is the same format, but with an error indicated in
> > the footer, and the row count being the number of rows returned before
> > the error was encountered:
> > If we're streaming over HTTP, this format takes advantage a user's
> > existing knowledge of how HTTP responses are formatted. It also
> > explicitly demarcates the header, data and footer sections of the
> > response; no checking if a row is an object or an array. It does not
> > rely on a user's language/framework of choice having a JSON parser
> > which can handle incomplete JSON documents, because each line is a
> > fully-formed JSON document. Client code is simpler because the success
> > case and the mid-stream error case are in the same format.
> Custom Accept and Content-Type headers are probably a good idea. As
> for newline; HTTP 1.1 spec (http://www.w3.org/Protocols/rfc2616/ > rfc2616-sec2.html#sec2<http://www.w3.org/Protocols/rfc2616/%0Arfc2616-sec2.html#sec2>)
> mandates CRLF for every element, except the
> entity body. My vote is for CRLF within and between the response
> sections, for consistency, but either CRLF or just LF are acceptable.
> -- Josh
> On May 8, 11:16 am, Nigel Small <ni...@nigelsmall.net> wrote:
> > The blank lines are an excellent idea - consistent with HTTP and no
> > requirement to sniff the type of line being read. We would probably be
> best
> > assigning this a content-type which explicitly needed "Accept"ing ...
> > "application/vnd.neo.cypher-results" or something like that. No reason
> that
> > the existing JSON format couldn't remain the default.
> > At the risk of bikeshedding: "\r\n", "\r" or "\n"?
> > On 8 May 2012 15:52, Josh Adell <josh.ad...@gmail.com> wrote:
> > > I don't think a document format is appropriate for streaming results,
> > > at least, the entire response should not be a document. For streaming,
> > > I would want to see one result per line, with a literal \r\n between
> > > each. Put the header with columns at the beginning and the footer at
> > > the end with a double \r\n separating them from the data, the same way
> > > an HTTP response separates header and body content:
> > > An error mid-stream is the same format, but with an error indicated in
> > > the footer, and the row count being the number of rows returned before
> > > the error was encountered:
> > > If we're streaming over HTTP, this format takes advantage a user's
> > > existing knowledge of how HTTP responses are formatted. It also
> > > explicitly demarcates the header, data and footer sections of the
> > > response; no checking if a row is an object or an array. It does not
> > > rely on a user's language/framework of choice having a JSON parser
> > > which can handle incomplete JSON documents, because each line is a
> > > fully-formed JSON document. Client code is simpler because the success
> > > case and the mid-stream error case are in the same format.