Text blob too big.

Josemi

unread,

Oct 31, 2022, 3:21:12 PM10/31/22

to Cap'n Proto

Hello.

I need to work with an structured data that have atributes with undefined lenght, some of them could have GB.

I have been using Protocol Buffers for it until I see that exists a hard limit of 2GB per message. Then I see this Stack Overflow solution. It says that Cap'n Proto "can support messages up to 2^64 bytes (2^32 segments of 4GB each)"

I reprogram the code and now it raise the error capnp.lib.capnp.KjException: capnp/layout.c++:1694: failed: text blob too big trying to set a 0.8 GB buffer to a Data type.

On Cap'n Proto the atribute seems like that:

file @1 :Data; # ptr[1],

And the code, in Python, is something like:

file = open('data_file', 'rb').read()

Whats wrong with that.

Cap'n Proto won't solve my problem?.

Thanks.

Kenton Varda

unread,

Oct 31, 2022, 3:32:30 PM10/31/22

to Josemi, Cap'n Proto

For a single Text or Data blob there is a hard limit of 512MB. You can, however, construct a message which contains multiple blobs, e.g. use `List(Text)`. Such a message can be up to 2^64 bytes.

If I were redesigning the encoding from scratch I'd probably allow for bigger individual blobs but there's no way to introduce them now without breaking compatibility, unfortunately.

Protobuf theoretically supports 2GB messages but because the messages have to be parsed upfront in O(n) time, you won't have good results with messages of that size. Cap'n Proto, on the other hand, quite comfortably supports multi-GB messages since you can mmap() the message and randomly access it in O(1) time.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/8a23280b-b43b-483a-94db-7fd94bba93een%40googlegroups.com.

Josemi

unread,

Oct 31, 2022, 3:44:41 PM10/31/22

to Cap'n Proto

Hi Kenton, thanks for the quick response. I was working now on this solution, the limit per blob is not too much of a problem for me.

Thanks to Cap'n proto I think I'll have an elegant solution. Thanks.

Jens Alfke

unread,

Oct 31, 2022, 10:45:12 PM10/31/22

to Josemi, Cap'n Proto

> On Oct 31, 2022, at 12:21 PM, Josemi <josem...@gmail.com> wrote:
>
> Hello.
>
> I need to work with a structured data that have atributes with undefined lenght, some of them could have GB.

Most structured storage is optimized for smaller data. And huge values in-line push all the records far apart, which is bad for cache performance.

SQLite supports arbitrary size blobs up to 2^64 bytes. (Even with that it’s best to put the blobs in a separate table and join your records to it.)

—Jens

Josemi

unread,

Nov 2, 2022, 1:22:04 PM11/2/22

to Cap'n Proto

El problema es que el servidor recibe todos los datos en un búfer de transmisión, pieza por pieza sin conocer el objeto estructurado. Así que lo estoy almacenando en un archivo porque el servidor no necesita abrirlo (solo necesita leer datos pequeños). Si quiero poner todos los atributos en las tablas SQL, tendré que tomar todos los fragmentos y construir el objeto estructurado en la memoria, y esto, según tengo entendido, es un problema.
No sé si mis ideas son correctas o me estoy perdiendo algo.

-- Josemi.

Jose mi

unread,

Nov 3, 2022, 1:05:56 PM11/3/22

to Cap'n Proto

Sorry I writed it in Spanish and see it now.

Translation:

The problem is that the server receives all the data in a transmission buffer, piece by piece without knowing the structured object. So I'm storing it in a file because the server doesn't need to open it (it only needs to read small data). If I want to put all the attributes in the SQL tables, I will have to take all the fragments and build the structured object in memory, and this, as I understand it, is a problem.
I don't know if my ideas are correct or am I missing something.

Jonathan Shapiro

unread,

Mar 17, 2023, 7:44:50 PM3/17/23

to Cap'n Proto

Amusingly, I was just looking at this for a protocol. Kenton's suggestion to use List[Data] works, but it carries an unintended buffering problem. A message can have multiple segments, and a correct capn-proto implementation will extend its segment pool as needed to hold a large byte sequence, but if I read the encoding spec correctly, it cannot send any of the segments until all of the segments are available for framing. Which means that your big blob of data sits in client memory while you are loading it, and stays there until the message is fully transmitted . Segment release could be optimized by releasing segments as transmission proceeds, but that isn't required by the capn-proto specification, and it doesn't resolve the "load big blob into memory" problem.

One alternative is to introduce a builder pattern, something like this:

interface FileBuilder {

write @0 (d : Data) -> Int32; # Returns number of bytes written

close @1 () -> (file: File)

}

interface MyService {

createFile @0 () -> (builder; FileBuilder)

myInterestingCall @1 (... file: File, ...) -> (val: InterestingResult)

}

The advantage to this, mainly, is that the byte transmission is divided into a sequence of messages. On the service side, these can be stashed aside until the Close() call is made, at which point the file object is fabricated on the service and a File capability is returned to the client that can be included in other messages. Because of promise pipelining, this doesn't take as many round trips as you might think.

The main puzzle here, in my mind, is that the server needs to know when client-side capabilities. I don't remember seeing anything like a capability release protocol that advises the serving entity when the state associated with an ephemeral object can be released.

Jonathan Shapiro

unread,

Mar 17, 2023, 8:59:18 PM3/17/23

to Cap'n Proto

I obviously need to finish reading specs before I ask stupid questions. rpc.capnp documents the release message quite clearly.

Reply all

Reply to author

Forward