standard content-type for "streaming JSON" (newline delimited JSON objects)?

7,694 views
Skip to first unread message

Ken

unread,
May 23, 2012, 4:47:36 PM5/23/12
to nod...@googlegroups.com
There seem to be a growing number of tools & packages around that implement some form of JSON streaming where multiple standard JSON objects are delimited by extra newlines, e.g.

{ "id": 1, "foo": "bar" }
{ "id": 2, "foo": "baz" }
...

This format seems both pragmatic and useful, but is not JSON compliant (i.e. doesn't parse with a standard JSON parser), so it seems inappropriate to serve up as "application/json".   Request-JSONStream uses "application/jsonstream", Google searching shows at least one use of "application/x-json-stream", and there are a number of services that use "application/json" and expect clients to just deal with it (cf. https://github.com/senchalabs/connect/issues/538).

Since I'm about to make heavy use of this technique in a way that will be a little difficult to unwind later I'd like if at all possible get on board with whatever will become the standard (defacto or official). Anyone aware of any efforts underway to standardize this, or packages/services that have enough momentum to drive a standard in the future?

Mark Hahn

unread,
May 23, 2012, 5:05:51 PM5/23/12
to nod...@googlegroups.com
Stay with the standard.  Crockford has made it clear that the standard will never change.  It is one of json's strengths.  You can count on the fact that what you do now will work forever.


--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Joshua Holbrook

unread,
May 23, 2012, 5:10:31 PM5/23/12
to nod...@googlegroups.com
Newline-delimited JSON is pretty common. I don't think there's a
standard content-type for it though.

> Stay with the standard.

Well, this *is* the standard, just combined with newline-delimited streaming.

--Josh
--
Joshua Holbrook
Engineer
Nodejitsu Inc.
jo...@nodejitsu.com

Mikeal Rogers

unread,
May 23, 2012, 5:13:48 PM5/23/12
to nod...@googlegroups.com
I've actually seen a few people use custom content types for it but not defacto standard has emerged for newline delimited JSON as a content-type.

Just use 'x-my-json-sucka' or whatever, just preface with x-

Ken

unread,
May 23, 2012, 5:17:53 PM5/23/12
to nod...@googlegroups.com
The JSON standard doesn't allow for this usage, which is why I think a new standard is needed.  If you're arguing that I should use JSON instead of newline-delimited JSON (by wrapping all objects in an array and delimiting them with commas?) it seems that the work on the client side to do streaming parsing gets much harder (and ultimately may even be less standards compliant).

Mark Hahn

unread,
May 23, 2012, 6:00:15 PM5/23/12
to nod...@googlegroups.com
 it seems that the work on the client side to do streaming parsing gets much harder 

I don't understand?  Parsing commas is hard?  However you planned on parsing newlines could parse commas instead.

Dick Hardt

unread,
May 23, 2012, 6:10:52 PM5/23/12
to nod...@googlegroups.com
Which comma do you parse is the problem -- and then you are writing a JSON parser.
I've used newlines in the past to separate JSON objects and think Ken's approach makes sense. A new content type would allow a generic parser to know the right thing to do.

To answer your original question, I have not seen anyone specing this out before. Write something up and post it.

-- Dick


On Wed, May 23, 2012 at 3:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
 it seems that the work on the client side to do streaming parsing gets much harder 

I don't understand?  Parsing commas is hard?  However you planned on parsing newlines could parse commas instead.

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en



--
-- Dick

Ken

unread,
May 23, 2012, 6:27:08 PM5/23/12
to nod...@googlegroups.com
The key is to avoid using something as a delimter that will show up as a matter of course in JSON objects.  Twitter uses \r\n pairs for their API (which I believe they serve up as application/json).  They may also include blank lines.  This format can be parsed using something as simple as the following:

function parse_JSON_stream(s) {
  return s.split("\r\n").filter(function (l) { return l != '';}).map(function (l) { JSON.parse(l); });
}

Using \n all by itself is safe if you encode with JSON.stringify and don't specify a third parameter.

Nuno Job

unread,
May 23, 2012, 7:26:07 PM5/23/12
to nod...@googlegroups.com
I wrote a json parser and I dont get the issue here!

I guess it must be a crappy one :) push pop!

Nuno

Bruno Jouhier

unread,
May 24, 2012, 3:14:04 AM5/24/12
to nod...@googlegroups.com
You could make it JSON compliant by emitting a [ line at the beginning of the stream and a false] line at the end (and adding a comma at the end of each line).
The extra value at the end could be interpreted as a "more" indicator. If true it means that this is only a stream fragment and that another fragment is expected later. If false it means that the stream is really over.

With this you could use a content-type like "application/json; layout=line-stream".

The only thing that would need to be standardized then would be the layout parameter and its values.

Daniel Rinehart

unread,
May 24, 2012, 9:19:48 AM5/24/12
to nod...@googlegroups.com
A random idea, haven't thought it through completely, would be to send
the response as a multipart message where each part is a valid JSON
object. Should be able to stream the MIME parsing and then hand each
chunk off to a JSON stream parser.

-- Daniel R. <dan...@neophi.com> [http://danielr.neophi.com/]

Isaac Schlueter

unread,
May 24, 2012, 12:52:35 PM5/24/12
to nod...@googlegroups.com
On Wed, May 23, 2012 at 3:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
It goes from trivial (because you don't have to inspect the JSON at
all) to not-trivial (because you do).

JSON can contain commas, but unless it's created using a pretty-indent
argument, it can never contain newlines. This means that your parser
only has to be aware of a single byte, and can dumbly skip over
everything else.

I think what we need is a new standard for \n delimited JSON streams.
It addresses a slightly different need (since you won't ever parse the
whole thing all at once, and it may not even ever end), and requires
the sender to not send pretty-formatted JSON, so that each object is
guaranteed to be a single line.

Actually, I think that's basically the spec:

1. Lines are delimited by \n (0x0A)
2. Each line must be a valid JSON string in UTF-8 encoding.

The only thing we're lacking is a mime-type.

Tim Caswell

unread,
May 24, 2012, 1:05:13 PM5/24/12
to nod...@googlegroups.com
application/x-json-stream -> JSON messages (without extra whitespace) newline separated.

Several streaming json parsers support this already out of the box.  They ignore the newlines and know when a json body ends because of the parser state.  People who don't have a streaming parser, can search for the newline instead and do their own de-framing.  Either way, it's very trivial to parse.

var parser = new StreamingParser({ multivalue: true });
req.pipe(parser);
parser.on("message", function (message) {
  // ...
});

Nuno Job

unread,
May 24, 2012, 1:06:21 PM5/24/12
to nod...@googlegroups.com
>> Several streaming json parsers support this already out of the box.
>> They ignore the newlines and know when a json body ends because of the parser state. 

Pretty much sums it up :)

Nuno

Alan Gutierrez

unread,
May 24, 2012, 1:24:27 PM5/24/12
to nod...@googlegroups.com
I'm using a JSON string per line as a somewhat durable file format for a b-tree.

http://bigeasy.github.com/strata/

The notion here is that the parser does not ignore the newline. A verification
program can break the file into lines so it can detect corrupt lines and
continue. With a parser a corrupt line would make it non-trivial to continue,
unless you used the newline as an marker, at which point you might as well use
it as the delimiter.

I've gone and added a hexadecimal checksum to the end of the line. The checksum
is a hexadecimal number or a hyphen for no checksum. The checksum algorithm is
any algorithm that emits a number.

Basically, checksummed frames of newline delimited JSON.

--
Alan Gutierrez - @bigeasy

Ken

unread,
May 24, 2012, 4:42:31 PM5/24/12
to nod...@googlegroups.com
Using arbitrary x- subtypes is allowed but if this format ever leaves experimental status we'd want to drop the x- and I don't think application/json-stream is ideal, in particular because the only other current use of the word stream in content types is application/octet-stream which has no particular relationship to "streaming" as we're using it here.  Digging through the various RFCs for mime types it's not exactly clear what the best choice would be, but a literal reading of http://www.ietf.org/rfc/rfc2046.txt suggests that it should definitely be under application:

registered subtypes of "text", "image", "audio", and "video" should
not contain embedded information that is really of a different type.
Such compound formats should be represented using the "multipart" or
"application" types.

It might make sense to use multipart/json but to comply with the RFC multipart types have to use line based delimeters (e.g. "--"), and the fact that there are so few existing multipart types suggests IANA doesn't like to give them out.  Perhaps we could instead just borrow multipart's boundary parameter to enhance application/json:
  • application/json
    • exactly one JSON object
  • application/json; boundary=NL
    • one or more JSON objects delimited by newlines (no newlines allowed in object)
  • application/json; boundary=CRNL
    • one or more JSON objects delimited by carriage return/newline pairs (newlines allowed in object)
As far as I can tell sending a single JSON object using any of these formats won't break any existing parsers (they all seem to handle an extra newline or carriage return/newline pair at the end without complaint, so the choice of boundary would only matter when multiple objects were sent.  One could conceivably (ab)use this approach to handle other approaches, such as Twitter's delimited=length option
  • application/json; delimited=length
    • multiple JSON objects, each prefixed with its length in bytes (encoded as a string of base-10 digits)
but a single object sent in this form will not parse with a standard parser (you'd have to strip off the length first) so it feels a bit off to me.

mscdex

unread,
May 24, 2012, 5:03:42 PM5/24/12
to nodejs
On May 24, 4:42 pm, Ken <ken.woodr...@gmail.com> wrote:
>    - application/json; boundary=NL
>       - one or more JSON objects delimited by newlines (no newlines allowed
>       in object)
>    - application/json; boundary=CRNL
>       - one or more JSON objects delimited by carriage return/newline pairs
>       (newlines allowed in object)

s/NL/LF/

Chris Dew

unread,
Jul 2, 2013, 8:45:47 AM7/2/13
to nod...@googlegroups.com
We really need a standard for Line Delimited JSON.

Please have a look at: http://en.wikipedia.org/wiki/Line_Delimited_JSON and comment on the talk page.

Floby

unread,
Jul 3, 2013, 4:18:43 AM7/3/13
to nod...@googlegroups.com
what's wrong with application/x-json-stream ?
We've been using application/x-www-form-urlencoded for years.
Message has been deleted

Timothy J Fontaine

unread,
Jul 4, 2013, 12:04:59 PM7/4/13
to nod...@googlegroups.com
It's about framing. Consider a persistent connection without a known total content delivery, or a streaming interface. If you send it framed as an array you won't be able to process the information as it arrives as you'll always be waiting for the end.

An environment like application/x-json-stream lets you continue to receive information and process it as it is available, without necessarily needing to use another abstraction like a websocket, which may or may not be available for your use.


On Thu, Jul 4, 2013 at 8:54 AM, Simon Majou <si...@majou.org> wrote:
I don't get the interest of this thing.

If you want to send an array, just use an array.

If you want to delimit json messages on a stream without parsing the messages, I don't know if it is practical as JSON accepts any unicode.

\n is valid inside of strings (http://json.org/string.gif). 


On Wednesday, May 23, 2012 10:47:36 PM UTC+2, Ken wrote:

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Message has been deleted

Floby

unread,
Jul 5, 2013, 11:16:09 AM7/5/13
to nod...@googlegroups.com
Gave a go at a parser for an actual array at https://github.com/Floby/node-dummy-streaming-array-parser
using some regex based tokenizer.

still more complicated than input.pipe(split()).pipe(through(JSON.parse))


On Wednesday, 23 May 2012 22:47:36 UTC+2, Ken wrote:

Chris Carpita

unread,
Oct 29, 2014, 10:43:28 AM10/29/14
to nod...@googlegroups.com
+1 to this.  I'm a bit horrified to see that newlines are being incorporated into a JSON-like standard as a record separator when newlines are in fact supported within JSON bodies.

IMO a generically named standard like "application/x-json-stream" should be a strict subset of JSON with the following requirements:

1. The top-level entity MUST be an array
2. A json-stream parser SHOULD emit "data" when each entity of a TLE is fully parsed.
3. A json-stream parser SHOULD emit "end" when the last entity of the TLE has been parsed.

I feel like this would address the concerns with framing and streaming applications without fundamentally breaking JSON.

That said, it would be a whole lot easier to integrate with Line-separated JSON in shell scripts, and perhaps this begs an explicit "application/x-line-separated-json" standard which is a superset of a NL-JSON standard wherein newlines are not permitted (a superset of a subset, as ridiculous as that sounds).

Elmer Bulthuis

unread,
Aug 4, 2017, 2:13:19 AM8/4/17
to nodejs
This link -> http://specs.okfnlabs.org/ndjson/ calls it `application/x-ndjson`. Kind of a weird name, but at least some took the effort to describe it properly!


Op woensdag 23 mei 2012 22:47:36 UTC+2 schreef Ken:
Reply all
Reply to author
Forward
0 new messages