I'm working on a cross-platform project, where we're serializing data
and sending it over TCP, using boost::serialization and boost::asio. As
the data is quite big, and is mainly numeric, binary serialization
provides a nice advantage over XML-based serialization, both in terms of
performance and data size.
But, I've found that the serialized data itself is not cross-platform.
E.g. if I have a simple app, that sends serialized data via a TCP
connection, it works fine as long as both ends of the connection are on
the same platform (say both are Linux x86_64, or both are Windows XP).
but they don't interact with each other - the data sent by one platform
is not accepted 'as is' by the other.
I wonder what provisions have to be done to achieve binary serialization
that would work in such a context?
Akos
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
have you checked the portable binary archive (in the exemples of the
serialisation lib source directory)
I had a similar issue with non portable serialized files, and it solved it !
By the way, i'm wondering how many such undocumented treasures are
hidding in this amazing source code maze that the boost source tree is :-) !
Regards,
Mathieu
Ákos Maróy a écrit :
Andrea
-----Messaggio originale-----
Da: boost-use...@lists.boost.org
[mailto:boost-use...@lists.boost.org] Per conto di Ákos Maróy
Inviato: venerdì 8 agosto 2008 12.01
A: boost...@lists.boost.org
Oggetto: [Boost-users] cross-platfrom binary serialization?
I think he was expecting that binary_archive is already fixed/portable binary
layout. Apparently not (that is news for me too).
As for the other mentioned portable archive, I'm curious about integral
encoding efficiency, Google's Protocol Buffers seem to be very efficient in
that regard (both in bytes used for representation but also encoding/decoding
operations speed)
http://code.google.com/apis/protocolbuffers/docs/encoding.html
--
Mihai RUSU Email: di...@roedu.net
"Linux is obsolete" -- AST
a possible solution could be to implement my own serializers for
integral types, that use the very same bitwise pattern? (endiannes,
size, etc.)
> I think he was expecting that binary_archive is already fixed/portable binary
> layout. Apparently not (that is news for me too).
yes, basically that was my expectation, that I serialize something.
store it, de-serialize it using the _same_ implementation, but maybe
compiled for a different platform, and then it would work fine.
> As for the other mentioned portable archive, I'm curious about integral
> encoding efficiency, Google's Protocol Buffers seem to be very efficient in
> that regard (both in bytes used for representation but also encoding/decoding
> operations speed)
> http://code.google.com/apis/protocolbuffers/docs/encoding.html
interesting. I'll take a look..
Akos
So if you encode your data into a byte/char stream then you will not have
problems about. Note that you must solve this even if your application saves
files and you want that they are binary portable on different platforms.
The google protocol is very nice, but I wonder why the first example takes 3
bytes. Isn't it possible to encode it with 2 bytes only? Like with the
string "testing" where the header that include the string length (value 7)
is encoded with 2 bytes.
Andrea
-----Messaggio originale-----
a possible solution could be to implement my own serializers for
integral types, that use the very same bitwise pattern? (endiannes,
size, etc.)
Akos
...
> I wonder what provisions have to be done to achieve binary serialization
> that would work in such a context?
lookup portable_binary_archive, IIRC in the
...libs/serialization/example directory. Make sure you use the version
from Trunk which has had some bugs fixed.
Jeff Flinn
> lookup portable_binary_archive, IIRC in the
> ...libs/serialization/example directory. Make sure you use the version
> from Trunk which has had some bugs fixed.
thanks, I'll take a look. can you give me a subversion URL? the
SourceForge site is only displaying error messages :(
Akos
> lookup portable_binary_archive, IIRC in the
> ...libs/serialization/example directory. Make sure you use the version
> from Trunk which has had some bugs fixed.
meanwhile I found the svn URL for boost, at
http://svn.boost.org/svn/boost/trunk
I'm looking at the sample, and I see the following comment:
// "Portable" input binary archive. It addresses integer size and
endienness so
// that binary archives can be passed across systems. Note:floating
point types
// not addressed here
so I guess floating point type support needs to be added to this
implementation? what would be your suggestion to implement floating
point support? use some portable binary representation for floating
point types?
Akos
-----Messaggio originale-----
Da: boost-use...@lists.boost.org
[mailto:boost-use...@lists.boost.org] Per conto di Ákos Maróy
Inviato: sabato 9 agosto 2008 14.54
A: boost...@lists.boost.org
Oggetto: Re: [Boost-users] cross-platfrom binary serialization?
and how would I go about handling the endianness issue? is there a
compile-time define in boost already that signals endianness of floating
point types? or is the endiannes of the floating point types the very
same as for integer types?
I was looking at the implementation of the Qt class QDataStream, which
serves a similar purpose. What they simply do is that they store the 4
bytes for floats in big endian order (swap if the system is little
endian). for doubles, they have a more elaborate mapping, handling four
cases:
- normal little endian format
- swapped little endian format
- normal big endian format
- swapped big endian format
their test is rather simple - they take a number that has a double
representation equivalent of the character array:
"0123ABCD0123ABCD\0\0\0\0\0\0\0"
in normal little endian format, and then use the pattern resulting on
the target system to determine the format of the system itself. but this
has to be done at compile time.
my question is: is there some similar, compile / configure time test for
boost available already, which result I could use to add float and
double support to the portable_binary_archive class, thus making it
complete? or do I have to implement my own tests for this purpose? (this
would add complexity to my project, as so far I myself don't have
configure-time tests at all, but depend on boost and similar libraries
to already provide me with target platform details).
Even better, take a look at portable_binary_archive in
the file vault: http://preview.tinyurl.com/5j46aq. It
correctly handles floating point types as well as integer
types.
Roman Perepelitsa.
> Even better, take a look at portable_binary_archive in
> the file vault: http://preview.tinyurl.com/5j46aq. It
> correctly handles floating point types as well as integer
> types.
Thanks.
How does this compare to the binary archive that's in the boost
repository? there it has the following files:
portable_binary_archive.hpp
portable_binary_iarchive.cpp
portable_binary_iarchive.hpp
portable_binary_oarchive.cpp
portable_binary_oarchive.hpp
while yours is merely two header files:
portable_binary_iarchive.hpp
portable_binary_oarchive.hpp
are these complete replacements for the above?
Akos
It's an implementation of portable binary archive, while
the think you can find in boost repository is just an
example of how one could try to implement portable binary
archive, therefore it's incomplete.
> there it has the following files:
>
> portable_binary_archive.hpp
This one contains shared code for iarchive and oarchive.
> portable_binary_iarchive.cpp
> portable_binary_iarchive.hpp
It's an implementation of iarchive.
> portable_binary_oarchive.cpp
> portable_binary_oarchive.hpp
It's an implementation of oarchive.
> while yours is merely two header files:
>
> portable_binary_iarchive.hpp
> portable_binary_oarchive.hpp
Well, they are not mine :)
There is no need for cpp files because implementation
is inline. Also they don't use anything like
portable_binary_archive.hpp, but it's an implementation
detail anyway.
> are these complete replacements for the above?
Yes.
Roman Perepelitsa.
>> are these complete replacements for the above?
>
> Yes.
Thanks for the info..
Akos
> Even better, take a look at portable_binary_archive in
> the file vault: http://preview.tinyurl.com/5j46aq. It
> correctly handles floating point types as well as integer
> types.
I'm looking at the portable_binary_archive contents you pointed me to,
and it seems to be a bit problematic. I see it was written by people
mostly using MS Visual Studio (I guess from the #pragma once line), and
it seems the code through a lot of warnings under gcc. a lot of signed /
unsigned comparison warnings for example, or checking if an unsigned
value is negative. (it also checks on the BOOST_VERSION macro without
including boost/versio.hpp).
I wonder how stable this code is, and if it is really used among
multiple systems. Are you actually using this code?
Akos
I see. I suppose you are right, original author used this code only
with MSVC.
> I wonder how stable this code is, and if it is really used among
> multiple systems. Are you actually using this code?
I don't. Cross-platform binary serialization is requested/discussed
quite frequently on boost-users mailing list and as far as I know,
there are only 2 implementations available: one in the file fault
and another one in serialization/examples, former being superior.
That's why I pointed you to the version from file vault.
For more info you might want to search boost-users archive for
more info or contact the author of the code (Christian
Pfligersdorffer <christian.pfligersdorffer at eos.info>).
Roman Perepelitsa.
boost-use...@lists.boost.org on :
> Ákos Maróy <akos <at> maroy.hu> writes:
>> I'm looking at the portable_binary_archive contents you pointed me
>> to, and it seems to be a bit problematic. I see it was written by
>> people mostly using MS Visual Studio (I guess from the #pragma once
>> line), and it seems the code through a lot of warnings under gcc. a
>> lot of signed / unsigned comparison warnings for example, or
>> checking if an unsigned value is negative. (it also checks on the
>> BOOST_VERSION macro without including boost/versio.hpp).
>
> I see. I suppose you are right, original author used this code only
> with MSVC.
And gcc-4, which also supports the #pragma once. The use of this construct and the warnings is simply a result from my strive to minimalize code length. What I express in code is the algorithmical idea in the shortest possible way I can think of. Let the compiler complain about it, I don't care because I _know_ what I'm doing and the warnings are vain. They issue from using one metafunction for all integral types, be them signed or unsigned. Of course you could separate those but only by introducing verboseness. Originally I even let the integral function treat the bool case resulting in even more warnings. The zero bool tweak in mind I decided to write an extra overload for bool. However if you have suggestions that fix the warnings and come close to a minimal solution I'll be happy to look into it.
I include enough boost headers which implicitly include version.hpp and the like so I need not bother making this explicit.
>> I wonder how stable this code is, and if it is really used among
>> multiple systems. Are you actually using this code?
>
> I don't. Cross-platform binary serialization is
> requested/discussed quite frequently on boost-users mailing
> list and as far as I know, there are only 2 implementations
> available: one in the file fault and another one in
> serialization/examples, former being superior.
> That's why I pointed you to the version from file vault.
We do. Using the archives we transferred terabytes between x86 and ppc (32-bit only). Using boost-1.33.1. Other combinations have not been tested or I do not know of the results.
PS: The portable binary archive that comes with the library examples should be complete, maybe just the comment was not removed. Robert Ramey often pointed out that this was sponsored by someone and will be in 1.36. I did not look at it yet, though.
Best regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
> And gcc-4, which also supports the #pragma once. The use of this
But the #pragma keyword is not for this. Even the MSDN page for #pragma
says:
"Each implementation of C and C++ supports some features unique to its
host machine or operating system ... The #pragma directives offer a way
for each compiler to offer machine- and operating system-specific features"
see http://msdn.microsoft.com/en-us/library/d9x1s805(VS.80).aspx
having to guard a header file from inclusion is in no way a machine or
OS dependent 'feature', thus it's not something you'd want to solve with
#pragma.
also see chapter 24 from C++ Coding Standards by Herb Sutter and Andrei
Alexandrescu, titled "Always write internal #include guards. Never write
external #include guards.", http://www.gotw.ca/publications/c++cs.htm
basically using #pragma once is bad style.
> I include enough boost headers which implicitly include version.hpp
> and the like so I need not bother making this explicit.
see chapter 23 from C++ Coding Standards by Herb Sutter and Andrei
Alexandrescu, titled "Make header files self-sufficient.". the fact that
you include version.hpp frequently doesn't mean it shouldn't be explicit
in this header file. if a header uses a feature, it should include the
header for that feature.
> We do. Using the archives we transferred terabytes between x86 and
> ppc (32-bit only). Using boost-1.33.1. Other combinations have not
> been tested or I do not know of the results.
glad to hear.
> PS: The portable binary archive that comes with the library examples
> should be complete, maybe just the comment was not removed. Robert
> Ramey often pointed out that this was sponsored by someone and will
> be in 1.36. I did not look at it yet, though.
good to know :)
if you're interested I can send you a version of your implementation I
changed, along the following:
- added #ifdef guards / removed #pragma once
- made the files self-sufficient
- removed signed / unsigned comparision / conversion warnings from the
save() functions of both iarchive and oarchive
- solved portability issue with right-shifting signed values in the
save() function of portable_binary_oarchive (right-shifting signed
values is implementation dependent)
I'd be glad to send it the changed code over, if interested.
Akos
Under boost_1_36_0\libs\serialization\example there are some hpp and cpp files
for portable binary format (portable_binary_archive.hpp,
portable_binary_iarchive.cpp ...) and I am wondering how performant the code
is.
Bijan
This was actually tested on various platforms of different 32/64 endian
combinations. And the test consisted of running ALL serialization
tests as is done with the "official" archive implementations. To me,
the main proble is that its missing support for floating point types.
Including such support in a definitive, portable manner would
be a significant effort which so far no one has deigned to undertake.
Robert Ramey
Please forgive my ignorance, but what is 'floating point types'?
QDataStream simply dump the internal presentation of float or double
numbers and swap them for big small endian coding if necessary...
Thanks.
Bo
>> This was actually tested on various platforms of different 32/64
>> endian
>> combinations. And the test consisted of running ALL serialization
>> tests as is done with the "official" archive implementations. To me,
>> the main problem is that its missing support for floating point
>> types.
>> Including such support in a definitive, portable manner would
>> be a significant effort which so far no one has deigned to undertake.
>
> Please forgive my ignorance, but what is 'floating point types'?
> QDataStream simply dump the internal presentation of float or double
> numbers and swap them for big small endian coding if necessary...
The internal presentation is not the same on all platforms.
Matthias
> Under boost_1_36_0\libs\serialization\example there are some hpp andThis was actually tested on various platforms of different 32/64 endian
> cpp files for portable binary format (portable_binary_archive.hpp,
> portable_binary_iarchive.cpp ...) and I am wondering how performant
> the code is.
combinations. And the test consisted of running ALL serialization
tests as is done with the "official" archive implementations. To me,
the main proble is that its missing support for floating point types.
Including such support in a definitive, portable manner would
be a significant effort which so far no one has deigned to undertake.
I like your style of critics. Always put a reference to your opinion so it gets more weight. However I do not agree with everything you write.
Ákos Maróy on Sunday, August 24, 2008 10:42 AM:
> Christian,
>
>> And gcc-4, which also supports the #pragma once. The use of this
>
> But the #pragma keyword is not for this. Even the MSDN page
> for #pragma
> says:
>
> "Each implementation of C and C++ supports some features
> unique to its host machine or operating system ... The
> #pragma directives offer a way for each compiler to offer
> machine- and operating system-specific features"
>
> see http://msdn.microsoft.com/en-us/library/d9x1s805(VS.80).aspx
>
> having to guard a header file from inclusion is in no way a
> machine or OS dependent 'feature', thus it's not something
> you'd want to solve with #pragma.
I think in this point you're mistaken, since the pragma is simply a communication channel directly to your compiler. GCC offers not only machine or architecture pragmas but also e.g. diagnostic pragmas, the #pragma message or means to change optimization level per compilation unit.
> also see chapter 24 from C++ Coding Standards by Herb Sutter
> and Andrei Alexandrescu, titled "Always write internal
> #include guards. Never write external #include guards.",
> http://www.gotw.ca/publications/c++cs.htm
I agree! Buut that's out of question.
> basically using #pragma once is bad style.
I strongly disagree :) It's shorter, thus more elegant, does not induce myriads of names like the preprocessor guards and is even faster. GCC also saw that and de-deprecated it in versions 4.x.
>> I include enough boost headers which implicitly include version.hpp
>> and the like so I need not bother making this explicit.
>
> see chapter 23 from C++ Coding Standards by Herb Sutter and
> Andrei Alexandrescu, titled "Make header files
> self-sufficient.". the fact that you include version.hpp
> frequently doesn't mean it shouldn't be explicit in this
> header file. if a header uses a feature, it should include
> the header for that feature.
You got a point here. It probably doesn't hurt and makes the whereabouts of the BOOST_VERSION macro evident in my case.
> if you're interested I can send you a version of your
> implementation I changed, along the following:
>
> - added #ifdef guards / removed #pragma once
> - made the files self-sufficient
> - removed signed / unsigned comparision / conversion warnings from the
> save() functions of both iarchive and oarchive
> - solved portability issue with right-shifting signed values in the
> save() function of portable_binary_oarchive (right-shifting
> signed values is implementation dependent)
>
> I'd be glad to send it the changed code over, if interested.
Sure! I'm curious how you changed my code, send it right over please! :) Especially the right-shifting issue caught my interest! Do you have examples where right shifting does not repeat the sign bit? Let me see how you solved it.
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
> Bijan wrote:
>>>> PS: The portable binary archive that comes with the library
>>>> examples should be complete, maybe just the comment was not
>>>> removed. Robert Ramey often pointed out that this was sponsored by
>>>> someone and will be in 1.36. I did not look at it yet, though.
>>>
>>> good to know :)
>>
>> Under boost_1_36_0\libs\serialization\example there are some hpp and
>> cpp files for portable binary format (portable_binary_archive.hpp,
>> portable_binary_iarchive.cpp ...) and I am wondering how performant
>> the code is.
>
> This was actually tested on various platforms of different
> 32/64 endian combinations. And the test consisted of running
> ALL serialization tests as is done with the "official"
> archive implementations. To me, the main proble is that its
> missing support for floating point types.
> Including such support in a definitive, portable manner would
> be a significant effort which so far no one has deigned to undertake.
>
> Robert Ramey
Hi Robert!
I am confused. Are you saying your "sponsored extension to the portable
binary archive example" does not include floating point support? Please
clarify what is in 1.36 and what is planned as this is entirely not
clear to me - and others I'd say from various peoples' contributions.
I'm sorry if I add to the portable binary archive confusion by using the
same names as your examples. I do not completely go against renaming my
classes if it helps. What do you mean?
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
On 24 Aug 2008, at 21:39, Bo Peng wrote:
>> This was actually tested on various platforms of different 32/64
>> endian combinations. And the test consisted of running ALL
>> serialization tests as is done with the "official" archive
>> implementations. To me, the main problem is that its missing support
>> for floating point types.
>> Including such support in a definitive, portable manner would be a
>> significant effort which so far no one has deigned to undertake.
>
> Please forgive my ignorance, but what is 'floating point types'?
> QDataStream simply dump the internal presentation of float or double
> numbers and swap them for big small endian coding if necessary...
That's what I do too: using fp_utilities I dump the bit pattern and
restore it later on. This works for "almost all" environments. However,
the word "almost" is a problem for an ultraportable library such as
boost. Andrea Denzler already mentioned the difficulties: IEEE754 is not
universally, nan-representations and endianness differ and forget about
long double when talking about portability. Conclusion: the "definitive,
portable manner" Robert talks about will be very hard to achive. On the
other hand: supporting IEEE 754 types float and double is (almost) easy.
Let's be pragmatic about that!
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
Thank you very much for your explanation. My understanding now is that
your version of portable binary serialization supports, like Qt, only
IEEE754 floating types, and the official version under boost/examples
does not.
Does your implementation provides a mechanism to test IEEE754 floating
point support? If I am going to use your library, I need to at least
give a warning in such cases, something like "This system does not
support IEEE754 floating point presentation so the created archive may
not be read correctly on other systems". I would consider my program
portable enough if this can be done.
Thank you very much.
Bo
"The above example session will build static and shared non-debug multi-threaded variations of the libraries. To build all variations use --build-type=complete"
The question is - what do I really need as just Boost user?
Will non-debug variations be suitable for me while I will debug my program under msvc (step-by-step execution)? (I have no need to debug deeply into Boost libs)
When will I need single-threaded variations? Or I will never meet it even if my program will contain just one thread?
Thanks!
This should make it apparent why I've never wanted to make a
"portable binary archive" but left it as a demo or example. There
is now way I could do this without making choice what some
people would view as not being what they envision a
"portable binary archive" to be. This would leave me with
a life time task of defending these choices forever on this
and other lists.
I believe that making a "portable binary archive" is possible.
In my view such an archive should be truely universally
portable as the text archives are. This is the standard I
would expect to meet if I were to undertake it. However,
meeting such a standard would require:
a) quite a bit of effort to address the variety of compilers,
word sizes, endienness, etc.
b) quite a bit of testing on all these environments
c) quite a bit of research into floating point formats,
Nans, and other stuff.
d) quite a bit of detailed documentation to explain
the compromises, decisions and rationale so the same
batttle don't have to be constantly re-fought.
If anyone want's to do this to make an "official archive" for
the serializaton library it would be a great thing. Such a person
should be prepared to:
a) Do all of the above
b) Submit his archive for a formal or mini review so that
other interested parties can comment on the submission
and agree that it represents a concensus set of choices in
those areas regarding trade offs.
c) Add the required documentation to the serialization documentation
d) Monitor the test results and take responsability for keeping
the code running in the face of changes in platforms.
e) Monitor the user/devel lists to address issues raised by users.
There is precedent for this. Matthias Troyer has done all this
(and a bit more) for binary archives and it has worked out well.
So its up to you - I don't know who "you" is here. Depends
on who want's to step up.
Robert Ramey
> Hi!
> I need to compile Boost now and don't know what libraries I really need.
>
> "The above example session will build static and shared non-debug
> multi-threaded variations of the libraries. To build all variations
> use --build-type=complete"
>
> The question is - what do I really need as just Boost user? Will
> non-debug variations be suitable for me while I will debug my program
> under msvc (step-by-step execution)? (I have no need to debug deeply
> into Boost libs) When will I need single-threaded variations? Or I
> will never meet it even if my program will contain just one thread?
Did you read http://boost.org/more/getting_started ?
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
I'm sure many will think this is a totally stupid suggestion, but I
can't resist making a fool of myself. ;-)
Since floats are the problem with portable binary archives, why not
punt on this issue and render floating point types (only) in ascii.
For many uses floating point is not the critical path and binary
archives solve many problems other than floating point: endian-ness,
native integer size differences, etc. And those issues can be quite
difficult to deal with otherwise.
This can be viewed as a special case of a standard technique: for
highly non-standard data using an independent format that translates
easily into each proprietary format. In this case the independent
format is simply ascii. In fact, since we are only rendering the
characters "[-+e.0-9]" we could use a modified BCD or other compressed
format to provide the compression that is typically what people assume
in binary formats.
Thoughts?
--
Robert
25.08.08, 20:28, "David Abrahams" <da...@boostpro.com>:
> on Mon Aug 25 2008, Roma <shmromacs-AT-yandex.ru> wrote:
> > Hi!
> > I need to compile Boost now and don't know what libraries I really need.
> >
> > "The above example session will build static and shared non-debug
> > multi-threaded variations of the libraries. To build all variations
> > use --build-type=complete"
> >
> > The question is - what do I really need as just Boost user? Will
> > non-debug variations be suitable for me while I will debug my program
> > under msvc (step-by-step execution)? (I have no need to debug deeply
> > into Boost libs) When will I need single-threaded variations? Or I
> > will never meet it even if my program will contain just one thread?
> Did you read http://boost.org/more/getting_started ?
> binary archive" but left it as a demo or example.
>
> I'm sure many will think this is a totally stupid suggestion, but I
> can't resist making a fool of myself. ;-)
> Since floats are the problem with portable binary archives, why not
> punt on this issue and render floating point types (only) in ascii.
>
> For many uses floating point is not the critical path
I don't agree. In scientific applications one uses huge amount (>>TB)
of data in float format (particularly doubles) and one needs to share
data files on some computing clusters knowning nothing about the
architecture of the systems used by end-users.
The question of exactness (no loss of the initial precision), and I/O
fastness is very important. It is also crucial to minimize storage space
with additionnal typical capabilities of compression (gzip and bz2 filters
are ok for that).
> and binary
> archives solve many problems other than floating point: endian-ness,
> native integer size differences, etc.
yes this part is a must.
> And those issues can be quite
> difficult to deal with otherwise.
>
> This can be viewed as a special case of a standard technique: for
> highly non-standard data using an independent format that translates
> easily into each proprietary format.
> In this case the independent
> format is simply ascii. In fact, since we are only rendering the
> characters "[-+e.0-9]" we could use a modified BCD or other compressed
> format to provide the compression that is typically what people assume
> in binary formats.
ok this is only a set of 14 glyphs so it could be hosted via short ints
(with 2 bits unused) consider a typical float (relative precision ~1e-7).
If one need to store pi as +0.3141592e+01 (ASCII) it is 14 characters
(only 11 is one saves leading'+' and exponent '+0' chars for >0 mantissa
and exponent)
that could be serialized using 14/11 shorts, so this is 28/22 bytes.
This has to be compared with 4 bytes for floats! This induces a typical
increase of storage by a factor ~6 at the additionnal CPU cost of the
underlying internal format conversion (ala sprintf). For me this is not
acceptable.
Similar approach holds for doubles.
More, as soon as you use ASCII format to store a float/double, you must
make a decision about the rounding of the last significant digit.
In most applications, this is not
a real problem for people don't care about ultimate numeric precision...
but in some circunstances, the reproductibility/portability I/O of
the whole numeric precision is absolutely necessary.
Imagine X be 3.141592654... being stored as 3.14160 in some archive
for computing sqrt(pi-X) on some other system.
I'm not sure this behaviour is acceptable by scientists (at least not by
me ;-) ).
So for me 'IEEE' format is the best approach one could use.
Boost.archive is so simple and easy to use (with a little care)
that I see no reason
why we should not get an efficient portable binary archive with floats.
Of course for some very very specific application that needs
long doubles or other not so portable stuff, one could use
the HDF5 library but this is not as simple as boost, unfortunately.
regards
frc
--
Francois Mauger
Laboratoire de Physique Corpusculaire de Caen et Universite de Caen
ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE
e-mail: mau...@lpccaen.in2p3.fr
tel.: (0/+33) 2 31 45 25 12
fax: (0/+33) 2 31 45 25 49
If a 'truly portable binary archive' is so difficult to achieve, we do
not have to call Christian's implementation 'portable'. I mean, why
cannot we make the current 'binary archive' almost portable by
1. Make types such as integers portable,
2. Use IEEE745 format to make float and double almost portable,
3. Give a mechanism to test if a system is IEEE745 compatible, and
mark if an archive is created on a compatible system.
Most users would still treat this 'binary archive' as non-portable,
as it is the case now. For other users who care about portability,
they are given a function to test if an archive can be opened safely
on another system. I would consider this as a great improvement over
the existing 'binary archive' implementation.
Cheers,
Bo
> I like your style of critics. Always put a reference to your opinion
> so it gets more weight. However I do not agree with everything you
> write.
We don't have to :) But it's good to see the point of other people as
well :)
> Sure! I'm curious how you changed my code, send it right over please!
> :) Especially the right-shifting issue caught my interest! Do you
> have examples where right shifting does not repeat the sign bit? Let
> me see how you solved it.
please see the sources attached.
as for the right-shifting, I only tried it on intel-based CPUs, so of
course here it always works the same. but the C++ is quite clear:
right-shifting signed values is implementation dependent, regarding the
fill bit on the left, so one shouldn't count on it :)
Akos
yes, this is a wide range of applications, but for example where I'm
using it, most of our data is in floats :) but you're right - it's best
to get a result that's not perfect but works, first :)
Akos
> Robert Ramey writes:
>> This should make it apparent why I've never wanted to make a
>> "portable binary archive" but left it as a demo or example.
>
> I'm sure many will think this is a totally stupid suggestion, but I
> can't resist making a fool of myself. ;-)
>
> Since floats are the problem with portable binary archives, why not
> punt on this issue and render floating point types (only) in ascii.
>
> For many uses floating point is not the critical path and binary
> archives solve many problems other than floating point: endian-ness,
> native integer size differences, etc. And those issues can be quite
> difficult to deal with otherwise.
>
> This can be viewed as a special case of a standard technique: for
> highly non-standard data using an independent format that translates
> easily into each proprietary format. In this case the independent
> format is simply ascii. In fact, since we are only rendering the
> characters "[-+e.0-9]" we could use a modified BCD or other compressed
> format to provide the compression that is typically what people assume
> in binary formats.
>
> Thoughts?
I have an alternate suggestion: what about continued fractions? Turn
the floating point value into a list of integers. This works no
matter what f.p. systems the source and destination use. You just
need a portable integer serialization format.
1. Serialize whether or not the float is in a NaN state as a
Boolean. If it's true, then you're done. Receiving systems that
don't support such states could either return a zero or throw. If
it's false, keep going.
2. Serialize the sign as a Boolean. This counts even for zero
values, if the f.p. uses the "negative zero" concept. Receiving
systems that don't should ignore the read sign for zero values.
Continue, but use the absolute value instead (even for infinity).
3. Serialize the exponent as an Integer. The exponent is the shift
needed to bring the base value between one and two (including 1,
excluding 2). So values of two and above get a positive shift,
values under one use negative shifts, and those that happen to be in
our implementation range use a zero shift. If the base value is
initially zero or infinite, use zero as the shift amount.
4. Serialize the base value as a list of continued fraction
components. The length is variable, so make sure to serialize it
too! For a zero base value, the list consists of a single element of
value zero. For infinity, use an empty list. For all other values,
start with the 1 as the whole part then proceed with the rest of the
components. (Since the f.p. state represents a binary fraction, I
suggest not using subtraction/truncation and reciprocating in
floating-point, but manipulating the virtual numerator and
denominator as integers with division/modulus. If the f.p. radix
isn't 2, you may want to do the exponent and continued fraction in
the native radix, then convert afterwards.) Saving a serialization
should start with the whole 1 and go down to smaller contributions.
Loading back a serialization should read in the entire list first,
then expand starting from the smallest/last contribution up to the
whole 1.
Example: -2.75 -> !is_nan, is_negative, 1.325 << 1; 1.325 is stored
as [1.01100], which is 44/32, which is 11/8, which has a c.f. of [1;
2, 1, 2]; so you'll serialize {False, True, 1, {4; 1, 2, 1, 2}}.
Example: NaN -> is_nan -> {True}; you mustn't look for any other
component.
--
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com
> I have an alternate suggestion: what about continued fractions? Turn
> the floating point value into a list of integers. This works no
> matter what f.p. systems the source and destination use. You just
> need a portable integer serialization format.
which we already have.
>
> ...
>
I'm don't have enough time to review the suggestion in the
detail it probably deserves. But a cursory look shows
a lot of imagination.
Robert Ramey
Then why cannot we use IEEE745 format for float serialization on all systems?
1. If a systems is IEEE745 compatible, serialize float numbers
directly. This will work 99.9% of the times, and will be very
efficient.
2. If a system is not IEEE745 compatible, to serialize a float number,
we write the number as 32 or 64 continuous bits, in IEEE745 format, to
de-serialize a float number, we read the number in IEEE745 formats and
write in native float format. An intermediate string representation
can be used for the translations.
I mean, these continuous IEEE745 bits are equivalent to your "portable
integer serialization format", but will be much more efficient on
IEEE745 compatible systems.
Cheers,
Bo
> Daryle Walker wrote:
>
>> I have an alternate suggestion: what about continued fractions? Turn
>> the floating point value into a list of integers. This works no
>> matter what f.p. systems the source and destination use. You just
>> need a portable integer serialization format.
>
> which we already have.
>
>>
>> ...
>>
>
> I'm don't have enough time to review the suggestion in the
> detail it probably deserves. But a cursory look shows
> a lot of imagination.
It will just be horribly inefficient. A text representation will be
faster and more compact.
Matthias
>> I have an alternate suggestion: what about continued fractions? Turn
>> the floating point value into a list of integers. This works no
>> matter what f.p. systems the source and destination use. You just
>> need a portable integer serialization format.
>
> Then why cannot we use IEEE745 format for float serialization on all
> systems?
>
> 1. If a systems is IEEE745 compatible, serialize float
> numbers directly. This will work 99.9% of the times, and will be very
> efficient.
> 2. If a system is not IEEE745 compatible, to serialize a
> float number, we write the number as 32 or 64 continuous
> bits, in IEEE745 format, to de-serialize a float number, we
> read the number in IEEE745 formats and write in native float
> format. An intermediate string representation can be used for the
> translations.
>
> I mean, these continuous IEEE745 bits are equivalent to your
> "portable integer serialization format", but will be much more
> efficient on IEEE745 compatible systems.
A splendid idea! Btw it's IEEE754... ;) But yeah, let's take it for the
portable binary archive's standard for floating point values. Machines
using a different notion may contribute a conversion algorithm or simply
throw an exception. However, the intermediate string approach would be a
last resort in my view.
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
>> I have an alternate suggestion: what about continued fractions?
>> Turn the
>> floating point value into a list of integers. This works no
>> matter what
>> f.p. systems the source and destination use. You just need a
>> portable
>> integer serialization format.
>
> Then why cannot we use IEEE745 format for float serialization on
> all systems?
Because I don't think that the IEEE-754 internal format is as stable
as you think it is. (Looking at Wikipedia's entries on IEEE-754 and
754r, there is a standard conceptual format, but problem is how do
various implementations carry out the internal bit-wise format. Room
for interpretation will doom your plan.)
> 1. If a systems is IEEE745 compatible, serialize float numbers
> directly. This will work 99.9% of the times, and will be very
> efficient.
> 2. If a system is not IEEE745 compatible, to serialize a float number,
> we write the number as 32 or 64 continuous bits, in IEEE745 format, to
> de-serialize a float number, we read the number in IEEE745 formats and
> write in native float format. An intermediate string representation
> can be used for the translations.
The string or "direct" conversions may introduce rounding errors.
> I mean, these continuous IEEE745 bits are equivalent to your "portable
> integer serialization format", but will be much more efficient on
> IEEE745 compatible systems.
Is it worth potentially screwing 0.01% of your customers when you may
not have to?
--
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com
_______________________________________________
> On 27 Aug 2008, at 15:29, Robert Ramey wrote:
>
>> Daryle Walker wrote:
>>
>>> I have an alternate suggestion: what about continued fractions?
>>> Turn
>>> the floating point value into a list of integers. This works no
>>> matter what f.p. systems the source and destination use. You just
>>> need a portable integer serialization format.
>>
>> which we already have.
[SNIP]
>> I'm don't have enough time to review the suggestion in the
>> detail it probably deserves. But a cursory look shows
>> a lot of imagination.
>
> It will just be horribly inefficient. A text representation will be
> faster and more compact.
But we're trying to avoid conversion rounding errors, which could
happen even under text conversion. If you don't like continued
fractions, then serialize the f.p. radix, the virtual numerator of
the binary (or whatever) fraction (include any implicit leading 1 or
whatever), and the power of the radix used for the virtual
denominator. They're all integers, so feel free to use text if you
want.
--
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com
_______________________________________________
Akos,
I see, you use make_signed and make_unsigned - didn't know those small
gems :) Thanks for pointing that out. The workarounds in the oarchive
are a bit clumsy but hey, if they root out all of the warnings with
little effort.
temp = negative ? (typename boost::make_unsigned<T>::type)
-((typename boost::make_signed<T>::type)t)
: t;
Now *that's* a nut :) Quite tricky! But I understand now that both casts
are neccessary. (After trying to remove the second one and rethinking ;)
Cannot we choose a stable, boost-specific format and consider all
others incompatible? This is actually required if we are going to
convert non-IEEE754 float numbers to a standard IEEE754 format by
ourselves. I guess we can define a set of characterization float
numbers and their standard binary representation in boost. Only those
systems that represent these numbers in the standard way can be
considered as IEEE754 compatible (subject to Big/Small Endian swap)
and can be archived directly.
> The string or "direct" conversions may introduce rounding errors.
I guess all string-based conversions have rounding errors. I am using
a text archive for its portability and I think the situation will not
be worse if I switch to an imperfect portable binary archive.
Also, there might be platform specific loseless conversion methods...
> Is it worth potentially screwing 0.01% of your customers when you may not
> have to?
I guess more than 0.01% boost/serialization users are suffering from
not having a portable binary archive, I am one of them.
Bo
| Daryle Walker <dar...@hotmail.com>
Sent by: boost-use...@lists.boost.org 28/08/2008 10:13
|
|
Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities.
I agree.
Almost all platforms use the IEEE754 format for float and double.
So these can, and probably should, be used for portable binary archives.
If anyone needs to support some platform that uses some other format,
then he will have to add code that converts to/from the IEEE754 formats.
Several different formats are used for long double.
I think it is reasonable not to support long double,
at least not initially.
I see three options for dealing with the endianness issue:
1. make all archives big-endian
2. make all archives little-endian
3. use the native format when saving, and put an endianness flag in the archive.
1 is inefficient when moving data between little-endian platforms
2 is inefficient when moving data between big-endian platforms
So 3 should be most efficient. Is there an easy way of storing an endianness-flag in an archive?
--Johan Råde
> 1 is inefficient when moving data between little-endian platforms
> 2 is inefficient when moving data between big-endian platforms
> So 3 should be most efficient. Is there an easy way of storing an
> endianness-flag in an archive?
The version currently in the package does that now for integer types.
Note, you could store floating points as a pair of integers using
the facilties already in portable binary archive. This would have
the side benefit of making the archives signifcantly smaller.
Robert Ramey
>
> Robert Ramey
>
Great.
Then you can handle float and double by just saving and loading the bytes,
and deal with endianness the same way as for integers.
I don't think there is any other scheme
that will make the archive significantly smaller without losing precision.
I believe that in most applications a float/double contains almost 32/64 bits of entropy.
--Johan
I don't think performance should be the overriding concern, especially
since byte-shuffling is very fast. The problem with option 3 is that it
introduces a potential source of bugs that only manifests when moving
between platforms with different endianness. I'd prefer option 1,
precisely because it requires shuffling on the most common platforms so
any bugs in the shuffling code are sure to be caught early.
--
Rainer Deyke - rai...@eldwood.com
On Friday 29 August 2008 01:42 am, Rainer Deyke wrote:
> Johan Råde wrote:
> > I see three options for dealing with the endianness issue:
> > 1. make all archives big-endian
> > 2. make all archives little-endian
> > 3. use the native format when saving, and put an endianness flag in the
> > archive.
> >
> > 1 is inefficient when moving data between little-endian platforms
> > 2 is inefficient when moving data between big-endian platforms
> > So 3 should be most efficient. Is there an easy way of storing an
> > endianness-flag in an archive?
>
> I don't think performance should be the overriding concern, especially
> since byte-shuffling is very fast. The problem with option 3 is that it
> introduces a potential source of bugs that only manifests when moving
> between platforms with different endianness. I'd prefer option 1,
> precisely because it requires shuffling on the most common platforms so
> any bugs in the shuffling code are sure to be caught early.
Why are you guys not just using XDR for your portable binary format?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIuDko5vihyNWuA4URAi6qAKCkfVFj3NsWuLhOMJQR8/K6PaoDAQCfUipv
CTwrCNBunviV+roNw2yTRX0=
=wQzm
-----END PGP SIGNATURE-----
But it isn't fast. If the necessity of bitshuffling makes it impossible to
serialize, say a vector<double> via the optimized array handling, you
could easily be talking about a factor of 10 in speed. Showstopper for us,
at least. I suppose you could copy the entire buffer, flip all the bits
at once, then serialize the flipped buffer, but this also has significant
cost, too much for scientific applications where another option exists.
The problem with option 3 is that it introduces a potential source of
> bugs that only manifests when moving
> between platforms with different endianness. I'd prefer option 1,
> precisely because it requires shuffling on the most common platforms so
> any bugs in the shuffling code are sure to be caught early.
Actually this is very easy to test for, even if you don't have machines of the
other endianness available. (the md5sum of the generated archive must match
for all platforms, and these sums can be checked in to svn)
-t
Have you looked at HDF5? It supports metadata, parallel IO, arbitrary
data structures, etc. and is designed for HPC applications. It also
has native binary format support and will automatically provide cross-
platform binary compatibility.
It will likely require a bit more code instrumentation than boost
serialization, but may be worth it if performance is key...
We do use HDF5... With boost.python and pytables plus boost::serializaton
backed c++ datastructures one can quite flexibly provide
converter/extractor/reducer utilities that get you from a boost::serialization
portable binary format to a more 'analysis friendly' hdf5 format. Works great.
-t
a) Basically, floats can be represent as two integers.
One for the exponent and one for a normalized fraction
of one.
b) The C standard library provides functions which
generate these two integers from any double (frexp)
and retrieves the origninal double from the pair
of integers(ldexp). I would guess that these functions
are pretty efficient as they only return a subset of
some existing bits.
c) the portable binary archive currently in the libraries
handles integers in a portable manner. It has been
tested on various platforms and already addresses
issues such as what to do when on attempts to load
an integer > 2^32 to a 32 bit machine. It also
strips leading bits which don't add anything and makes
the archives smaller.
It would seem using the standard functions - supported
by any standard C library - and using the functionality
already in portable_binary_archive, one could add
floating point functionality relatively easily - and it
would be no less portable than the C library is.
Robert Ramey
It is when compared to the overhead of IO (disk or socket, possibly even
memory).
> If the necessity of bitshuffling makes it impossible to
> serialize, say a vector<double> via the optimized array handling, you
> could easily be talking about a factor of 10 in speed.
I think here you are talking about the overhead of a single write
operation versus multiple write operations on the underlying stream,
correct?
It's true that the standard stream operations can be slow, but that is a
separate problem from the actual byte shuffling and should be solved
separately. Maybe this problem could be avoided by using a
std::vector<char> instead of a stream object for the actual
serialization and then dumping it all at once.
(It is not reasonable to just dump in-memory objects to a stream in any
portable format, binary or text.)
> The problem with option 3 is that it introduces a potential source of
> > bugs that only manifests when moving
>> between platforms with different endianness. I'd prefer option 1,
>> precisely because it requires shuffling on the most common platforms
>> so any bugs in the shuffling code are sure to be caught early.
>
> Actually this is very easy to test for, even if you don't have machines
> of the
> other endianness available. (the md5sum of the generated archive must
> match
> for all platforms, and these sums can be checked in to svn)
I though option 3 was to write little-endian archives on little-endian
machines and big-endian archives on big-endian machines? If so, the
generated archives would /not/ be the same. Hence the potential source
of bugs.
--
Rainer Deyke - rai...@eldwood.com
_______________________________________________
> Just to keep the pot boiling - here's my two cents.
>
> a) Basically, floats can be represent as two integers.
> One for the exponent and one for a normalized fraction
> of one.
>
> b) The C standard library provides functions which
> generate these two integers from any double (frexp)
> and retrieves the origninal double from the pair
> of integers(ldexp). I would guess that these functions
> are pretty efficient as they only return a subset of
> some existing bits.
>
> c) the portable binary archive currently in the libraries
> handles integers in a portable manner. It has been
> tested on various platforms and already addresses
> issues such as what to do when on attempts to load
> an integer > 2^32 to a 32 bit machine. It also
> strips leading bits which don't add anything and makes
> the archives smaller.
>
> It would seem using the standard functions - supported
> by any standard C library - and using the functionality
> already in portable_binary_archive, one could add
> floating point functionality relatively easily - and it
> would be no less portable than the C library is.
I wonder if it really works so well when the word size of the machines
differs, or even when the word size is 32 bits on both ends. It's
likely they're both using IEE754, so if long double has more than 32
bits of mantissa, your method will be needlessly lossy. I think long
double commonly has 96 or 128 bits total, so you'd lose significant
precision. The HPC community has had to solve this problem numerous
times. These are people that care about the accuracy of their floating
point numbers. Why one would begin anywhere other than with the formats
the HPC people have already developed is beyond me.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
> I wonder if it really works so well when the word size of the machines
> differs, or even when the word size is 32 bits on both ends. It's
> likely they're both using IEE754, so if long double has more than 32
> bits of mantissa, your method will be needlessly lossy. I think long
> double commonly has 96 or 128 bits total, so you'd lose significant
> precision. The HPC community has had to solve this problem numerous
> times. These are people that care about the accuracy of their
> floating point numbers. Why one would begin anywhere other than with
> the formats the HPC people have already developed is beyond me.
The current implementation implements a variable length format
where only the significant bits are stored. If it turns out that a number
is stored in the archive cannot be represented on the machine
reading the archive an exception is thrown. This would occur
where a 64 bit machine stored a value > 2^32 and a 32 bit
machine tried to loaded.
This method has one great advantage. It automatically
converts between integer (or long or what ever) when
the size of integer varies between machines. It also eliminated
redundent data (leading 0's) and never loses precision.
If it can't do something the user want's to do - it punts.
Its up to the library user to decide how to handle these
special situations.
I believe leveraging on this by converting floats to a pair
of integers and serializating them would simplify the
job and result in a truely portable (as opposed to 99%)
archive.
BTW - I was wrong about the two library functions
mentioned above. The to return the exponent of the
normalzed value - but the return mantissa as a float
rather than an integer - damn!.
Robert Ramey
> This should make it apparent why I've never wanted to make a
> "portable binary archive" but left it as a demo or example.
> There is now way I could do this without making choice what
> some people would view as not being what they envision a
> "portable binary archive" to be. This would leave me with a
> life time task of defending these choices forever on this and
> other lists.
I think I understand now... the issue is a boiling pot :) I'm counting
50 replies to the topic already and there's no concensus. I do not think
I want to be the one who tries to satisfy everybody. That's a daunting
task...
This is boon and bane of the boost libraries.
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
I will use your version of portable binary archive as long as there is
a way to test its compatibility with the underlying system... Just
something like "whoops, this system is not IEEE754 compatible so
archives created on this system will not be portable".
Bo
On Tuesday 02 September 2008 10:57 am, Pfligersdorffer, Christian wrote:
> Robert on Monday, August 25, 2008 7:04 PM:
> > This should make it apparent why I've never wanted to make a
> > "portable binary archive" but left it as a demo or example.
> > There is now way I could do this without making choice what
> > some people would view as not being what they envision a
> > "portable binary archive" to be. This would leave me with a
> > life time task of defending these choices forever on this and
> > other lists.
>
> I think I understand now... the issue is a boiling pot :) I'm counting
> 50 replies to the topic already and there's no concensus. I do not think
> I want to be the one who tries to satisfy everybody. That's a daunting
> task...
>
> This is boon and bane of the boost libraries.
I don't see how adding archive support for an existing standard portable
binary format (like XDR) would be controversial. It's not like its existence
would preclude the addition of another end-all beat-all portable binary
format, if someone really decides they are determined to invent one.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIvV+S5vihyNWuA4URAgvCAJ96EPwy15H4QAY5gQ+iZCBelzcBHwCeOGky
f6k+RmW+i7l2XJZjfakjS4c=
=8fmQ
-----END PGP SIGNATURE-----
I think that the main problem of the serialization issues are due to the
fact that we use incompatible cross platform datatypes in our code. If int
is 32 bit in one system and 64 bit in another system then of course we run
into issues. The existing workarounds will work, but they may give failures
on other/new systems. What if int is 128bit or 16bit? The starting point of
using incompatible data types is wrong.
Using internal compatible datatypes is the first issue to solve, instead of
defining something like
int foo;
It is better if we can use something like:
int<min_value,max_value> foo;
On different platforms it will lead to different native c++ types, but I
don't care about. I want to be sure to have my range of values for foo.
In that case serialization is relatively easy to solve. I may would add some
minor options like performance versus low data size for a specific
serialization. Sometimes I need fast code, other times I need a low data
size. But this is just a extra option.
The same should be used for floating point values, something like
double<precision_digits,exponent_digits> foo;
I define how many digits I need and want. When using foo I know the minimum
precision available, maybe the c++ floating point type used has a greater
precision, but I don't care about.
Again when serializing I know how many digits are necessary, and probably
the IEEE standard of the two integers is the best to use. But this a choice
of the library write. I want a full portabile datatype/archive where a
specific number of digits are guaranteed in any operation (+, -, power,
serialization, etc).
Of course an exception is thrown at compile time if the specific
platform/library can't handle such requirements of example int<0,2^500> foo.
IMHO... :)
Andrea
>> I do not think
>> I want to be the one who tries to satisfy everybody. That's a
>> daunting task...
>
> I will use your version of portable binary archive as long as
> there is a way to test its compatibility with the underlying
> system... Just something like "whoops, this system is not
> IEEE754 compatible so archives created on this system will not be
> portable".
It's not straight forward to do such a test but I will have a look at it
for the next release. Johan Rade does a classication of floating point
formats in his fp_utilities. I'll see if I can use that.
Regards,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
In practice almost all platforms have float and double implementations
that are enough IEEE 754 compliant to make it ok to save the bytes and load them again.
The only exceptions I know of are:
1. some compilers have a setting where denormals, infinity and Nan are not used
2. on VMS there is still support for the VAX floating point format
Condition 1 can be detected by numeric_limits<T>::has_denorm etc.
Condition 2 can be detected as follows
#if defined(__vms) && defined(__DECCXX) && !__IEEE_FLOAT
Also note that the bit patterns that represent quiet NaN on one platform
can represent signaling NaN on other platforms.
But very few C++ developers seem to care about the difference between
quiet and signaling NaN.
And forget about portable binary serialization of long double
(unless you want to do a lot of work).
--Johan
I drew the consequences from the quite lengthy discussion and released a
new version of my portable binary archives on the boost vault:
http://www.boostpro.com/vault/index.php?directory=serialization
In case of integer serialization I removed the warnings and floating
point serialization I complemented with a couple of checks whether
preconditions are met on the platform at hand. Thanks to everybody for
your tipps and sharing your wisdom!
Remember that the archives were tested only against boost 1.33 (and 1.34
a little) but so far I have not had time to look into the newer
versions.
Let me know if it works for you,
--
Christian Pfligersdorffer
Software Engineering
http://www.eos.info
Johan Rade on Saturday, September 06, 2008 12:07 PM:
François Mauger wrote:
>> In fact, since we are only rendering the
>> characters "[-+e.0-9]" we could use a modified BCD or other compressed
>> format to provide the compression that is typically what people assume
>> in binary formats.
>
> ok this is only a set of 14 glyphs so it could be hosted via short ints
> (with 2 bits unused)
? I think each of 14 glyphs could be represented in 4 bits, with 2 bit
patterns left over.
> consider a typical float (relative precision ~1e-7).
> If one need to store pi as +0.3141592e+01 (ASCII) it is 14 characters
> (only 11 is one saves leading'+' and exponent '+0' chars for >0 mantissa
> and exponent)
> that could be serialized using 14/11 shorts, so this is 28/22 bytes.
> This has to be compared with 4 bytes for floats!
It seems to me that 14 characters in the constrained glyph set could be
represented with 14 4-bit "nybbles," or 7 bytes.
It's still worse than 4 bytes, but by a factor < 2 rather than ~6.
Please forgive me if I've misunderstood you.
Ooops, rereading my message, I simply cannot figure out
the reason why I wrote such a stupid thing!
You are absolutely right and your calculation of this factor ~<2
rather than 6 is ok.
So this hypothetical system is still bad, but not as bad as I claimed
before. Thank you for the fix.
And sorry for the trouble ("Au temps pour moi!").
regards
frc
--
--
Francois Mauger
Laboratoire de Physique Corpusculaire de Caen et Universite de Caen
ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE
e-mail: mau...@lpccaen.in2p3.fr
tel.: (0/+33) 2 31 45 25 12
fax: (0/+33) 2 31 45 25 49