Recommendations for high performance transit encoding and decoding

97 views

Skip to first unread message

Daniel Compton

unread,

Dec 1, 2016, 6:47:54 AM12/1/16

to transit-format

I am working on a server which deals with reading and writing to CSVs relatively large Transit data (50k homogenous maps) supplied by a browser client. I'd like to make it as fast and efficient as possible. At first, I thought that writing each map as a separate token would be best as it would let me deal with the data in a streaming fashion. However looking at the Transit wire representation, key caching only occurs within a single write. This pushes the balance in the favour of writing all of the maps in one transit write. However, this means that at read time, all of the maps will need to be read into memory at once.

In the grand scheme of things it is probably not a big deal either way, but I was interested in what the recommendations were for this kind of workload?

Tim Ewald

unread,

Dec 1, 2016, 9:00:28 AM12/1/16

to transit...@googlegroups.com

You need to balance how much space the data takes on the wire and in memory, and the right balance depends a lot on what you are doing. If you server can handle reading all the maps at once, that's the least data to move across the wire. But that assumes that (a) you don't have a lot of clients sending large data to the server at the same time and (b) the size of the data sent by even one client won't grow to an unmanageable size. I would build support for chunking the data into the writer, with configurable chunk size. On the read side, I would read each chunk coming in and process however many maps it contains. Then you can adjust the wire size and memory usage as needed. Then you could tune the chunk size to trade off amount of caching vs amount of memory consumed on read. (You might also have the reader refuse to read content over a certain size in order to protect the server from poor (or malicious) client configuration.)

Tim-

On Wed, Nov 30, 2016 at 6:12 PM, Daniel Compton <daniel.com...@gmail.com> wrote:

I am working on a server which deals with reading and writing to CSVs relatively large Transit data (50k homogenous maps) supplied by a browser client. I'd like to make it as fast and efficient as possible. At first, I thought that writing each map as a separate token would be best as it would let me deal with the data in a streaming fashion. However looking at the Transit wire representation, key caching only occurs within a single write. This pushes the balance in the favour of writing all of the maps in one transit write. However, this means that at read time, all of the maps will need to be read into memory at once.

In the grand scheme of things it is probably not a big deal either way, but I was interested in what the recommendations were for this kind of workload?

--
You received this message because you are subscribed to the Google Groups "transit-format" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transit-format+unsubscribe@googlegroups.com.
To post to this group, send email to transit-format@googlegroups.com.
Visit this group at https://groups.google.com/group/transit-format.
For more options, visit https://groups.google.com/d/optout.