Hi,
I've looked at protocol buffers, and I've noted that there is no
support for arrays
of values (double, integers). This is a significant drawback, for
example
JSOM, HDF5 etc they all have this.
> This is exactly what I've done before putting arrays into a string.
> When I've implemented arrays via repeated fields, the program was
> even slower,
> and the file size was too large (compare to Java serialization
> mechanism+ zip).
If you put the values in a string and do you own array management on
top as compared to using a repeated field with packed option, there
should not be a significant difference because it is essentially the
same.
Protobufs don't come with a compression, so if you compare the sizes,
you need to compare compressed Java serialization with compressed
proto serialization.
If you provide an example of what you want to do and what are your
current solutions you compare, people on this list might be able to
help.
-h
So for java serialization, you have a class that contains a
ArrayList<NamedArray> with NamedArray objects containing a
Vector<double> and then serialize the whole ArrayList<NamedArray> to
disk ?
> 3) File size is very large. I do not know how to fill
> compressed recorsd on fly using this package.
If you want to write the independent records, you should write them
delimited to a file and not put everything in memory.
Regarding compression: you write the stuff to a stream eventually, so
you can wrap that with a GZipOutputStream - I guess that is what you
do with the Java serialization with compression as well.
> Finally, there is no even sensible approach to append new "Records"
> to the existing file (without "merge", which in fact has to parse
> the
> existing file first!)
Protocol buffers don't provide the transport or storage layer. They
provide the encoding. You have to provide for the storage yourself. A
simple default implementation might be useful to start but still many
people still would need to write their own way of storing things.
OTOH, it is only a handful of lines to write it yourself.
For things like this (and is has been discussed many times on this
list), you should write out delimiters telling the size of the next
record followed by the record itself. I think there even has been
something added recently to the API to make this simpler (don't know,
I use my own implementation ;) )
-h
1) After event 500, even 200MB memory is not enough.
2) It's slower by factor ~5 compare to the java serialization with
the
compression.
3) File size is very large. I do not know how to fill
compressed recorsd on fly using this package.
Finally, there is no even sensible approach to append new "Records"
to the existing file (without "merge", which in fact has to parse
the
existing file first!)
So, I do not see any superiority of Protocol Buffers compare
to use file formats, it's actually much worst as it come to such
situations..