Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
REST client always streaming?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  18 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Josh Adell  
View profile  
 More options May 6 2012, 11:55 am
From: Josh Adell <josh.ad...@gmail.com>
Date: Sun, 6 May 2012 08:55:45 -0700 (PDT)
Local: Sun, May 6 2012 11:55 am
Subject: REST client always streaming?
Question for other Neo4j REST client library authors and people
familiar with the REST server internals:

I'm really liking the performance increase of streaming Cypher. I'd
like to make it the default (and only method) for Cypher results in
Neo4jPHP.

Unfortunately, I made some design choices in Neo4jPHP that make it
difficult to tune my request headers on a per-request basis. I did
some initial tests against 1.5, 1.6, 1.7 and 1.8 Neo4j instances, and
none of them seem to mind passing the "Accept: application/
json;stream=true" header to all the endpoints; it looks like they will
just ignore the "stream=true" part.

So the questions are:

1) Despite being a bit semantically incorrect, should I just pass
"stream=true" in the Accept header to every endpoint, regardless of
server version?

2) Are there any compatibility, functionality or performance concerns
I'm not seeing?

3) For library authors: do you give your users a choice to use
streaming or not? Or is that an implementation detail that the library
hides from the user?

Obviously, the question is open to anybody, but I'm particularly
interested in how other library authors handled this.

-- Josh


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aseem Kishore  
View profile  
 More options May 6 2012, 2:18 pm
From: Aseem Kishore <aseem.kish...@gmail.com>
Date: Sun, 6 May 2012 14:18:56 -0400
Local: Sun, May 6 2012 2:18 pm
Subject: Re: [Neo4j] REST client always streaming?

I'm the maintainer of the Node.js client library.

I haven't had a chance to check out streaming Cypher yet, but it's great to
hear that it's nothing but positive. =)

I'd love answers to (1) and (2) also.

I can't answer (3) myself without understanding streaming Cypher a bit
more, so here are some additional questions from my side:

(4) Is it accurate to say that streaming Cypher just means that Neo4j
returns JSON as a stream instead of all at once?

(5) Is it also accurate to say that a client library that waits for the
entire HTTP response to finish before parsing the JSON will continue to
work just fine?

If (5) is indeed the case, I don't see any reason the user should have to
be concerned with streaming Cypher; my answer to (3) then would be no,
that's an implementation detail (a perf optimization).

And just to understand this further:

(6) Can client libraries theoretically take advantage of streaming Cypher
even more by also parsing JSON results (and potentially returning them to
clients) as they stream in rather than all at the end?

If (6) is also the case, then it seems client libraries should have this as
an enhancement option -- let the developer handle result rows as they come
in rather than at the end.

Thanks guys,

Aseem


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 6 2012, 6:14 pm
From: Nigel Small <ni...@nigelsmall.net>
Date: Sun, 6 May 2012 23:14:33 +0100
Local: Sun, May 6 2012 6:14 pm
Subject: Re: [Neo4j] REST client always streaming?

Hi Josh

I believe the streaming method will eventually dominate and render the
previous method redundant. So, except for reasons of backward
compatibility, I can't see any reason why there would be a need for a
slower, less flexible method in the long-term. To that end, I don't believe
a choice should be given to library users. This allows us to maintain a
consistent interface over the longer term while avoiding the need for extra
complexity that the user probably wouldn't care about, possibly wouldn't
understand and would almost certainly hardly ever need. There also seems
little harm in passing "stream=true" to every request - it's certainly my
plan to do so.

I've not found any compatibility issues either- so far, everything has been
remarkably smooth. I am however looking at some performance stats although
nothing so far has indicated that the streaming is anything but a vast
improvement :-)

Nige

On 6 May 2012 16:55, Josh Adell <josh.ad...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josh Adell  
View profile  
 More options May 6 2012, 6:35 pm
From: Josh Adell <josh.ad...@gmail.com>
Date: Sun, 6 May 2012 15:35:32 -0700 (PDT)
Local: Sun, May 6 2012 6:35 pm
Subject: Re: REST client always streaming?
Hey Aseem,

As far as #5 goes, that's the way Neo4jPHP currently works in my
experiments (i. e. wait for the entire stream to finish and parse the
results.) I'm not even aware of a PHP library that does true streaming
HTTP, so Neo4jPHP will continue to do that for the foreseeable
future. The performance gain is entirely on the server side, and it's
a vast improvement (returning ~10000 rows in ~1 second with streaming
and ~3.5 seconds without streaming.)

I would love to see #6 be a reality, but you really have to trust that
the server will send well-formed JSON if you want to start parsing it
before the full JSON document is received. For me, that would also
mean writing my own JSON parser, as PHP's built-in parser expects a
fully-formed document to begin with.

Anyway, great additional questions! Thanks.

-- Josh

On May 6, 2:18 pm, Aseem Kishore <aseem.kish...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josh Adell  
View profile  
 More options May 6 2012, 6:39 pm
From: Josh Adell <josh.ad...@gmail.com>
Date: Sun, 6 May 2012 15:39:29 -0700 (PDT)
Local: Sun, May 6 2012 6:39 pm
Subject: Re: REST client always streaming?
Nigel,
That's what I was leaning towards, I'm just curious how others handled
it. Does py2neo wait to receive the entire JSON document before
parsing, or is it parsing a partial JSON document as it streams in? If
the latter, did you write your own parser for that, or is there
already a Python library that parses partial JSON?

-- Josh

On May 6, 6:14 pm, Nigel Small <ni...@nigelsmall.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aseem Kishore  
View profile  
 More options May 6 2012, 6:49 pm
From: Aseem Kishore <aseem.kish...@gmail.com>
Date: Sun, 6 May 2012 18:49:35 -0400
Local: Sun, May 6 2012 6:49 pm
Subject: Re: [Neo4j] Re: REST client always streaming?

Great, thanks Josh!

You might be able to find a PHP library that can parse JSON streams. I
haven't used any myself, but there certainly exist several out there across
many platforms.

Besides searching Google, this SO question has lots of input:
http://stackoverflow.com/questions/444380/is-there-a-streaming-api-fo...

Cheers,
Aseem


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 6 2012, 7:26 pm
From: Nigel Small <ni...@nigelsmall.net>
Date: Mon, 7 May 2012 00:26:20 +0100
Local: Sun, May 6 2012 7:26 pm
Subject: Re: [Neo4j] Re: REST client always streaming?

Finding client libs that supported streaming was my biggest challenge. For
HTTP, I settled with tornado which allows a streaming callback, called each
time a new chunk is received. JSON was more of a problem though. I've had
to put together some code which decodes one line at a time, incrementally
building up the complete document. It's a bit messy and relies on the
content being pretty-printed but seems to do the job. The code is at:

https://github.com/nigelsmall/py2neo/blob/7195b2463460b8980e02dd41b65...

I'm considering rebuilding a streaming JSON parser outside of the main code
but haven't had the time so far. I would certainly prefer not only to be
able to decode after the whole thing is received otherwise I'm missing a
potential benefit, performance-wise.

On top of this, the entire interface to Cypher execution has changed in
py2neo 1.2. There are now callbacks in place, the main one of which is
called each time a new row has been received from a query. This allows the
application to begin to use the response before it has completely arrived.
There's another callback for the metadata (currently only columns) which
unfortunately always seems to kick off *after* the rows have been received
since the column data follows the row data in the response. It would be
nice to have the columns arrive first so that tabulated output could be
produced in order (for example).

Nige

On 6 May 2012 23:49, Aseem Kishore <aseem.kish...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options May 7 2012, 1:27 am
From: Peter Neubauer <neubauer.pe...@gmail.com>
Date: Mon, 7 May 2012 07:27:34 +0200
Local: Mon, May 7 2012 1:27 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Nigel,
The order thing sounds like a good improvement issue. Great discussion!
On May 7, 2012 1:26 AM, "Nigel Small" <ni...@nigelsmall.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options May 7 2012, 4:56 am
From: Michael Hunger <michael.hun...@neotechnology.com>
Date: Mon, 7 May 2012 10:56:54 +0200
Local: Mon, May 7 2012 4:56 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Great discussion,

thanks for all the input.

There is only one breaking change when it comes to streaming (and passing stream=true), exceptions and other errors _might_ occur only after the fact, i.e. when the first data was already streamed so they won't be reflected in the header.

This is especially true for cypher, the batch-rest-api and traversals, not so much for other calls.

For the batch-rest-API the commands that failed will abort the operation and contain the status code and error messages as part of its result payload.

Regarding a better streaming friendly format.

I would like to change the streaming cypher format into a stream of fully formed json objects,
first. header (contains the columns and perhaps query and parameters)
then. times row (with the data, or an error object that aborts the query)
last. footer (total rows, time taken, other metadata)

It is what you'd get in the batch-rest API by leaving of the first and last "[" "]".

This should be much easier to consume in a streaming way, see also my test-client impl in the streaming-cypher experiment server-extension:
(https://github.com/neo4j-contrib/streaming-cypher/blob/master/src/mai...)
esp. the callback interface.

For the changes in the batch-rest-API (which will be merged in this week) the performance gain is:
4 sec for creating 30k nodes with streaming (and almost no memory usage)
14 sec for creating 30k nodes w/o streaming (and lots of memory used)

All these changes don't yet contain the compact format which would add another performance gain, but we're not sure yet how to request that compact format,
- either with a different URI or query parameter (different representation)
- an additional or extended header field
- .... ?
- the application/json;stream=true is probably also preliminary as it is not the correct way to indicate streaming-able clients

Michael

Am 07.05.2012 um 07:27 schrieb Peter Neubauer:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 7 2012, 11:44 am
From: Nigel Small <ni...@nigelsmall.net>
Date: Mon, 7 May 2012 16:44:56 +0100
Local: Mon, May 7 2012 11:44 am
Subject: Re: [Neo4j] Re: REST client always streaming?

I can see the issue with errors and agree that the only way to dynamically
produce an error part way through the output would be to ensure that a
series of objects were passed instead of parts of a bigger object as it is
today. A couple of questions though:

1. Does each row (of data) need to be a JSON object? Would a JSON array not
make more sense?
2. The header row isn't an issue but how do you delimit the footer row?
There clearly cannot be a row count up-front due to the nature of the query
results but how do we know that we have the footer and not just another row
of data?

On the other hand, we could use an object for the header and footer and an
array for each data row. That could give us something like:

{"columns": ["name", "age"]}
["Alice", 33]
["Bob", 44]
["Carol", 55]
["Dave", 66]
{"row_count": 4, "time": 1.23}

That would mean if a line starts with a "{" then it's a metadata row and if
it starts with a "[" then it's a normal data row.

As metadata, an error could also then be rendered as:

{"columns": ["name", "age"]}
["Alice", 33]
["Bob", 44]
{"error": {"code": 999, "message": "Bad stuff happened"}}

Nige

On 7 May 2012 09:56, Michael Hunger <michael.hun...@neotechnology.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options May 7 2012, 12:22 pm
From: Michael Hunger <michael.hun...@neotechnology.com>
Date: Mon, 7 May 2012 18:22:54 +0200
Local: Mon, May 7 2012 12:22 pm
Subject: Re: [Neo4j] Re: REST client always streaming?

Right I implied objects or arrays or other json constructs (like strings, numbers, booleans) when saying "object"

we could know that this is the footer by:
- it being an object instead of an array
- having a dedicated key in there that specifies the type: (e.g. type: footer , similarly type:header)
- or having an "EOF" string denoting the end of the stream after the footer

Michael

Am 07.05.2012 um 17:44 schrieb Nigel Small:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aseem Kishore  
View profile  
 More options May 8 2012, 10:04 am
From: Aseem Kishore <aseem.kish...@gmail.com>
Date: Tue, 8 May 2012 10:04:03 -0400
Local: Tues, May 8 2012 10:04 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Sorry, haven't caught up on the full thread, just the last part, but why
won't the current JSON format work just fine? It delimits "header" info
(e.g. columns) from the rows (called "data"), so you could just add further
keys for "footer" info (e.g. errors, time, etc.).

{
    columns: [ ... ],
    data: [ ... ],
    error: ...

}

Where the data is still a streaming array of arrays.

One important thing IMHO would be for the entire response to still be valid
JSON. What do you guys think -- do you agree with that goal?

If commas separating the rows is a concern, you can easily address it w/
Isaac Schleuter's comma-first style when streaming the JSON back. E.g.
here's what the rows would look like:

data:
// first row processed...
[ [...]
// then second row...
, [...]
// then third row...
, [...]
// no more rows left
]

Cheers,
Aseem

On Mon, May 7, 2012 at 12:22 PM, Michael Hunger <

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 8 2012, 10:26 am
From: Nigel Small <ni...@nigelsmall.net>
Date: Tue, 8 May 2012 15:26:24 +0100
Local: Tues, May 8 2012 10:26 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Why do we want to constrain ourselves to a pure JSON response? Because
that's the way we've done it until now? We haven't had streaming results
until now.

Is a document format appropriate for streaming results? I honestly don't
believe so: one is static by nature the other is dynamic.

If we were to design the response format without any knowledge of the
current implementation, how would we go about it?

On 8 May 2012 15:04, Aseem Kishore <aseem.kish...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aseem Kishore  
View profile  
 More options May 8 2012, 10:36 am
From: Aseem Kishore <aseem.kish...@gmail.com>
Date: Tue, 8 May 2012 10:36:25 -0400
Local: Tues, May 8 2012 10:36 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Nah, I was just saying it because that way it works with existing tools.
Ideally, the content-type doesn't even need changing -- as far as HTTP is
concerned, it doesn't matter whether the response is streamed or sent all
at once.

I also say this because the bottleneck wasn't client libraries parsing one
chunk of JSON, it was Neo4j building up all the results in memory before
serializing them to JSON. That's fixed; it doesn't harm Neo4j to send back
valid JSON still.

Not a big deal, just something I think could be worth maintaining since it
doesn't need to have a high cost.

Aseem

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josh Adell  
View profile  
 More options May 8 2012, 10:52 am
From: Josh Adell <josh.ad...@gmail.com>
Date: Tue, 8 May 2012 07:52:52 -0700 (PDT)
Local: Tues, May 8 2012 10:52 am
Subject: Re: REST client always streaming?
I don't think a document format is appropriate for streaming results,
at least, the entire response should not be a document. For streaming,
I would want to see one result per line, with a literal \r\n between
each. Put the header with columns at the beginning and the footer at
the end with a double \r\n separating them from the data, the same way
an HTTP response separates header and body content:

{"columns": ["col1", "col2"]}\r\n
\r\n
["valA","valX"]\r\n
["valB","valY"]\r\n
["valC","valZ"]\r\n
\r\n
{"error":null,"row_count":3,"time":1.23}

An error mid-stream is the same format, but with an error indicated in
the footer, and the row count being the number of rows returned before
the error was encountered:

{"columns": ["col1", "col2"]}\r\n
\r\n
["valA","valX"]\r\n
["valB","valY"]\r\n
\r\n
{"error":{"code":123,"message":"Something bad happened"},"row_count":
2,"time":1.23}

If we're streaming over HTTP, this format takes advantage a user's
existing knowledge of how HTTP responses are formatted. It also
explicitly demarcates the header, data and footer sections of the
response; no checking if a row is an object or an array. It does not
rely on a user's language/framework of choice having a JSON parser
which can handle incomplete JSON documents, because each line is a
fully-formed JSON document. Client code is simpler because the success
case and the mid-stream error case are in the same format.

Just my thoughts.

-- Josh

On May 8, 10:26 am, Nigel Small <ni...@nigelsmall.net> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 8 2012, 11:16 am
From: Nigel Small <ni...@nigelsmall.net>
Date: Tue, 8 May 2012 16:16:33 +0100
Local: Tues, May 8 2012 11:16 am
Subject: Re: [Neo4j] Re: REST client always streaming?

The blank lines are an excellent idea - consistent with HTTP and no
requirement to sniff the type of line being read. We would probably be best
assigning this a content-type which explicitly needed "Accept"ing ...
"application/vnd.neo.cypher-results" or something like that. No reason that
the existing JSON format couldn't remain the default.

At the risk of bikeshedding: "\r\n", "\r" or "\n"?

On 8 May 2012 15:52, Josh Adell <josh.ad...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josh Adell  
View profile  
 More options May 8 2012, 11:25 am
From: Josh Adell <josh.ad...@gmail.com>
Date: Tue, 8 May 2012 08:25:53 -0700 (PDT)
Local: Tues, May 8 2012 11:25 am
Subject: Re: REST client always streaming?
Custom Accept and Content-Type headers are probably a good idea. As
for newline; HTTP 1.1 spec (http://www.w3.org/Protocols/rfc2616/
rfc2616-sec2.html#sec2) mandates CRLF for every element, except the
entity body. My vote is for CRLF within and between the response
sections, for consistency, but either CRLF or just LF are acceptable.

-- Josh

On May 8, 11:16 am, Nigel Small <ni...@nigelsmall.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nigel Small  
View profile  
 More options May 8 2012, 11:34 am
From: Nigel Small <ni...@nigelsmall.net>
Date: Tue, 8 May 2012 16:34:35 +0100
Local: Tues, May 8 2012 11:34 am
Subject: Re: [Neo4j] Re: REST client always streaming?

Who am I to argue with the W3C? :-)

+1 for CRLF

On 8 May 2012 16:25, Josh Adell <josh.ad...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »