Not so long ago I was looking for a solution to Base64 encoding/decoding.
In this page -
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/dataflow.html
- I found example of such solution, with 'base64_from_binary' and
'transform_width' iteartors. But it is just a "blank", not ready-to-use
solution.
I created solution. But I definitely know many developers looking for a
simple, ready-to-use, header-only and 'real C++' solution for Base64. So I
want to offer you such solution.
Some examples:
int main() {
std::string my_text = "My text for encoding!";
std::string encoded_text = boost::base64().encode( my_text );
std::string my_text_again = boost::base64().decode< std::string >(
encoded );
}
For binary data:
int main() {
typedef std::vector< unsigned char > binary_string;
binary_string my_bin = list_of( 0xFF )( 0xAA )( 0x01 );
std::string encoded_bin = boost::base64().encode( my_bin );
binary_string my_bin_again = boost::base64().decode< binary_string >(
encoded );
}
For streams:
int main() {
boost::filesystem::ifstream my_jpeg( "/some/path/to/my/image" );
// fstream checking omitted...
std::string encoded_jpeg = boost::base64().encode( my_jpeg );
boost::filesystem::ofstream my_jpeg_again(
"/some/path/to/decoded/image" );
// fstream checking omitted...
boost::base64().decode( encoded_jpeg, my_jpeg_again );
}
This is first variant. IMHO such solution will be useful for Boost users.
What you think about it?
- Denis
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Looks fine! A base16 (hex) variant would be nice too.
Olaf
I'm not sure how much need there is for Base64....
That interface is restrictive. Why not take iterators and ranges for input and output iterators for output. That would grant greater flexibility to source and destination: your code would be captured as algorithms.
In the JPEG example, that could mean writing the encoded image straight to disk, instead of to an intermediate string, which would reduce memory pressure, for example.
_____
Rob Stewart robert....@sig.com
Software Engineer using std::disclaimer;
Dev Tools & Components
Susquehanna International Group, LLP http://www.sig.com
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
Thanks, Olaf!
Yes, there are three such encodings: Base16, Base32 and Base64. But
Base64, IMHO, is most often required.
- Denis
Yes, Rob, iterators are more flexible, but this flexibility is not always
necessary.
For example, we can write:
std::vector< int > v{ 1, 2, 3 };
auto it = std::find( v.begin(), v.end(), 2 );
But Boost.Range provides "less flexible" solution:
std::vector< int > v{ 1, 2, 3 };
auto it = boost::range::find( v, 2 );
IMO, this solution is much easier and safer, but less flexible.
- Denis
Since I suggested iterators and ranges as the interface, and your examples implied neither, I fail to understand how your answer actually bears on my suggestion. Does your interface support ranges without your having shown it? Is there output iterator support?
_____
Rob Stewart robert....@sig.com
Software Engineer using std::disclaimer;
Dev Tools & Components
Susquehanna International Group, LLP http://www.sig.com
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
I assume it's pretty easy to provide both APIs.
Olaf
Sorry, Rob, perhaps I misunderstood your question.
In fact, now is the main question about fundamental interest to
Boost-based base64 ready-to-use solution, but not about implementation
details. Of course, details will be discussed.
- Denis
Hmmm - I'm not sure what this means. It is used in serialization library
so it's not just a "blank".
Perhaps you're refering to the way it's implemented by building
up from more primitive concepts using concepts now implemented
are part of the range library. It predates the range library but it
seems to me it's very similar. I find the way it's implemented
very appealing. For example, if one wanted to change the line
length, this would be easy to do without adding anything. The
base64 implementation in the serialization library isn't really
a library, but more of an instance of a huge family of text
processing facilities. So I think a much more interesting and useful
project would be:
"an optimally fast, character range processing library which can be used
construct a wide variety of character processing algorithms by
composition of more primitive concepts in a regular way." Examples
include base64, ... and generation of custom code_convert facets
for io_streams..
Robert Ramey
int main() {
std::string my_text = "My text for encoding!";
std::string encoded_text;
boost::base64().encode( my_text, std::back_inserter( encoded_text ) );
// Or like this:
// boost::base64().encode( my_text, encoded_text );
// or:
// boost::base64().encode( my_text.begin()
// , my_text.end()
// , encoded_text );
std::string my_text_again;
boost::base64().decode( encoded_text, std::back_inserter(
my_text_again ) );
// Or like this:
// boost::base64().decode( encoded_text.begin()
// , encoded_text.end()
// , std::back_inserter( my_text_again ) );
}
So developer can choose the interface he likes. All technical details, of
course, will be discussed...
- Denis
> http://code.google.com/p/stringencoders/
Pretty simple API:
> char buf[100];
> char result[10];
> char endian[] = {(char)0, (char)0, (char)1};
> int d = modp_b64_encode(buf, endian, 3);
And very fast:
> http://code.google.com/p/stringencoders/#Why
With some neat, very platform specific tricks:
> http://code.google.com/p/stringencoders/#How_It_Works
Don't get me wrong, I'd definitely like to see Boost have something because that extreme of performance isn't always necessary, but FYI for those interested in the discussion. -sc
--
Sean Chittenden
se...@chittenden.org
I understand you, Robert. You're talking about flexible (and probably big)
library, and I'm talking about the utility. But this utility solves one
concrete common task, and do it (IMHO) good. This can be compared with
boost::mem_fn or boost::lexical_cast - do one concrete task and do it well.
My solution is a single .hpp file, and it can be part of... hmmm...
Boost.Utility, for example, or even part of Boost.Serialization.
- Denis
I also have one solution for base64.
The code is short and trivial, it can be used with arbitrary input and
output iterators or ranges, and can also be used to generate a range
that computes base64 lazily as it is being iterated.
>> Does your interface support ranges without your having shown it? Is
>> there output iterator support?
>>
>> Rob Stewart robert....@sig.com
>
> Sorry, Rob, perhaps I misunderstood your question.
>
> In fact, now is the main question about fundamental interest to
> Boost-based base64 ready-to-use solution, but not about implementation
> details. Of course, details will be discussed.
I think you'd find it interesting to see how the serialization library does
it.
../boost/archive/iterators/base64_from_binary
../boost/archive/iterators/binary_from_base64
These provide a pair of iterators which can be used with any
algorithm which take a pair of iterators - ie most of of the standard ones.
It's much more than an implementation detail - it's a whole different
way of looking at the problem. It works with all of lot's of overstuff.
Of course there's nothing that prevents one from creating a wrapper
which provides a more convenient interface for many situations. So I
would encourage you to think bigger:
a) take a look at dataflow iterators in the boost serializaiton library
b) take a look at range iterators. (this uses similar technique but is
much more general).
c) craft a more general - text processing toolkit which includes as
examples a convenient wrapper for base64 conversion, also
I would like to see it include examples for implementing code_cvt
facets.
Robert Ramey
Exactly where is flexibility lost?
> I think you'd find it interesting to see how the serialization library
does
> it.
> ../boost/archive/iterators/base64_from_binary
> ../boost/archive/iterators/binary_from_base64
Yes, Robert. :-)
I took theese iterators (and 'transform_width') and wrap them in one simple
and ready-to-use solution. Simple and ready-to-use even for novices.
I need a such solution, for this common task. And I definetly know many
developers looking for such solution. Solution they can use immediately,
without long studying the documentation. Just few lines of code:
int main() {
std::string text = "TEXT";
std::string encoded = boost::base64().encode( text );
std::string text_again = boost::base64().decode< std::string >( encoded
);
}
You think such (or similar) solution is not needed for Boost? Well, probably
it is not needed, but I think it's useful. If not - I apologize for
troubling.
- Denis
Hmmm - so you are proposing that something like the following header
be added to boost?
#include <string>
#include <list>
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/remove_whitespace.hpp>
#include <boost/archive/iterators/transform_width.hpp>
namespace boost{
namespace base64{
// convert to base64
template<typename CharType>
std::string encode(std::basic_string<CharType> & s){
CharType *raw_data = s.c_str();
std::size_t size = s.size();
typedef std::list<CharType> text_base64_type;
text_base64_type text_base64;
typedef
boost::archive::iterators::insert_linebreaks<
boost::archive::iterators::base64_from_binary<
boost::archive::iterators::transform_width<
CharType *
,6
,sizeof(CharType) * 8
>
>
,72
> translate_out;
std::copy(
translate_out(static_cast<CharType *>(rawdata)),
translate_out(rawdata + size),
std::back_inserter(text_base64)
);
}
std::string decode(std::string & s){
...
}
Which I don't really see as a bad idea in and of itself. But I wonder about
other aspects?
a) where in the directory/namespace would this be? boost, boost/utility, or?
b) would it include all the boost machinery besides code? documentation,
formal review
tests etc?
c) All this for one small special purpose function?
d) How is a user going to find this pair of functions in the boost
libraries.
e) As soon as someone puts this in, imediately some else will ask - great
but it needs a parameter to so I can change the line width. Then someone
asks
for someother tweak. This is the result of making a "simple solution".
It's easy
enough to make a simple solution - it's very hard to keep it simple.
This is the reasoning behind my suggestion that for something like this one
should
be prepared to think bigger. In my view value of something like this (it
DOES have
value) isn't justified by the cost of getting it into boost. My advice -
think bigger.
I realize that the overhead of all this isn't your fault, but it still
exists. I don't see boost
as having the infrasture to handle an army of small (special purpose)
functions such as this.
Robert Ramey
Yes, Robert, something like this.
> a) where in the directory/namespace would this be? boost, boost/utility,
or?
For example, in boost/base64 directory and in boost namespace. I think it
CAN be in boost namespace.
So we have boost/base64 directory with docs, examples and base64.hpp file.
In main boost directory will be file with same name base64.hpp with:
#include <boost/base64/base64.hpp>
> c) All this for one small special purpose function?
Yes, Robert, for one small, concrete and frequent task.
> d) How is a user going to find this pair of functions in the boost
libraries.
As well as he going to find boost::mem_fn, for example... :-)
> e) As soon as someone puts this in, imediately some else will ask - great
> but it needs a parameter to so I can change the line width. Then someone
> asks for someother tweak. This is the result of making a "simple
solution".
> It's easy enough to make a simple solution - it's very hard to keep it
simple.
I understand you. My solution is simple and will be simple. Without "line
width", special formatting, etc.
Base64 encoding/decoding, nothing else.
- Denis
> I think you'd find it interesting to see how the serialization library does
> it.
>
> ../boost/archive/iterators/base64_from_binary
> ../boost/archive/iterators/binary_from_base64
The code of these is much more complicated than it needs to be; there is
too much coupling between the algorithm and the iterator adaptor.
well, it's implemented in terms of iterator adaptor. I see that as a
feature.
This gives maximal efficiency and total compile time flexibility and
leverages on working/tested/reviewed code - which to me is
what boost is about.
I can't see how another implementation which doesn't use
the already made stuff can be faster or more efficient or
add less code to the code base.
Robert Ramey
Yes, it's a feature, but unfortunately the base64 logic is intertwined
with the iterator logic, and concerns are not as well separated as they
could be.
From a glimpse at the code, I am also not really convinced that it's
got maximal efficiency either, it could probably be faster by computing
at compile-time the optimal bit twiddling logic and reducing the
redundant branches.
> I can't see how another implementation which doesn't use
> the already made stuff can be faster or more efficient or
> add less code to the code base.
I have an implementation of a generic system that can generate, from a
model of a very simple and explicit Converter concept, an iterator
adaptor, an eager algorithm, a codecvt facet or a boost.iostreams filter.
The idea is basically that I put inside a concept the notion of a
conversion step, and then I built an iterator adaptor that can work for
any model of that concept.
A possible implementation of a base64 encoder Converter, along with a
codecvt demo (hardly the best backend) is available at
<http://svn.boost.org/svn/boost/sandbox/SOC/2009/unicode/libs/unicode/example/base64.cpp>
It's not generic (in the sense that it doesn't allow N to M bit
conversion like transform_width, it just does base64 -- after all it's
only meant to be an example) but that's something that would also be
fairly easy to do.
Now I am not suggesting that Boost should definitely use this, I'm just
presenting alternatives.
[snipped example wrapping detailed, generic code to provide "obvious"
if less flexible interface]
> Which I don't really see as a bad idea in and of itself. But I wonder about
> other aspects?
>
> a) where in the directory/namespace would this be? boost,
> boost/utility, or?
Maybe add a "Boost.Cookbook", showing mere mortal C++ programmers how
to best use the deeper aspects of Boost?
It could even be advertised as "non-normative", in a sense, so that it
wouldn't hold up a release.
> b) would it include all the boost machinery besides code?
> documentation, formal review, tests etc?
If we allocated a cookbook / samples directory, we could use "silos"
to apply the formal process to individual files/functions:
doc/base64.txt
src/base64.hpp
test/base64_test.cpp
This would be completely independent of someone taking, e.g., one of
the TCP servers out of the ASIO docs and making a sample out of it:
doc/tcp_server.txt
src/tcp_server.hpp
test/tcp_server_test.hpp
I just caught up on a week or two of posts to boost-devel, and I
recall there was at least one other instance of someone suggesting a
tiny snippit of code for inclusion into boost, and others replying
[essentially] that there's too much overhead to bother adding 6 lines
somewhere.
> c) All this for one small special purpose function?
There are many other useful small snippits that are currently strewn
across the documentation for various libraries. Maybe centralizing
them in one place (again, as "cookbook" or "here's how the boost
experts /use/ boost, as opposed to when they're writing boost") would
be helpful.
I'm particularly reminded of a quote from Scott Meyer's _Effective STL_:
"One programmer's vision of expressive purity is another programmer's
demonic missive from Hell."
> d) How is a user going to find this pair of functions in the boost
> libraries.
A cookbook with an index, especially with keywords, would be nice.
(Speaking of which, is there a single index that covers all the boost
libs in a single document?)
> e) As soon as someone puts this in, imediately some else will ask -
> great but it needs a parameter to so I can change the line
> width. Then someone asks for someother tweak. This is the result
> of making a "simple solution". It's easy enough to make a simple
> solution - it's very hard to keep it simple.
Ah, but the cookbook approach would allow for a better response: we've
shown you how to do the simple, straightforward thing using these
powerful constructs; if you want to change something, take this code
and adapt it to your needs, or use the more powerful tools directly.
> This is the reasoning behind my suggestion that for something like
> this one should be prepared to think bigger. In my view value of
> something like this (it DOES have value) isn't justified by the cost
> of getting it into boost. My advice - think bigger.
Does a cookbook seem a reasonable way to think bigger?
> I realize that the overhead of all this isn't your fault, but it
> still exists. I don't see boost as having the infrasture to handle
> an army of small (special purpose) functions such as this.
It might be that creating such a nook would allow for quite a few
contributions, precisely because there's such a high hurdle to
overcome for a normal library. I'd guess that everyone on this list
has a handful of "helper functions" that they'd be happy to submit if
it only took an hour of their time, but can't afford to make fully
generic and fleshed out and then wait 6 months for review.
Just an idea.
Best regards,
Tony
It's not such a bad idea - but it's an entirely different idea than what
boost is. I think we're in agreement here.
Off topic - but I'll make a couple of comments on the C++ Cookbook ideal
Actually this exists. I simply troll the net for "C++ base64 source code"
and I get a bunch of hits. I didn't look into them but I would hope/expect
that there would be something similar to that which has been proposed.
I would look at them, and perhaps copy one of them out and into my
code. This is fine, works fine, and solves my problem.
But this doesn't belong in boost. Boost strives to catalogue the
difinitive, complete and general solution to widely encountered problem
domains. This means the the libraries are of necessity more complete,
better documented, better tested, rigorously reviewed and vetted. It's
an entirely different thing than the C++ cookbook. Boost is feeling
the strain trying to keep up with it's current mission which no one else
is doing. It can't expand in to new territory which is already covered.
Just because an idea may be a good idea, it doesn't necessarily follow
that it should be in boost.
Robert Ramey
IMO it's not fine. It's a recipe for big scale code duplication.
If the same function appears in tons of C++ apps it's IMO a sign that
some lib is lacking.
Olaf
Hi Denis!
I think having base64 stuff in boost utilities would be very useful.
But I think interfaces that you are providing is very high-level.
I've implemented base64_encode some time ago and it had
the following interface:
template <class InIt, class OutIt>
OutIt base64_encode(InIt f, InIt l, OutIt out);
I suppose your 'encode' interfaces could me implemented via function above.
Also pay attention that there are not only std::string and std::istream/ostream
in the world. E.g., one would possibly like to encode QString.
Another comment is performance. For most applications this could be crucial.
Could you please provide the sources in order
we could compare our home-bred solution to yours?
Do you have any optimizations for random access ranges/iterators/containers?
Or do you assume that input parameter (string/range/container/whatever)
could be always accessed randomly?
Regards
You are right, Alexander, this interface is very high-level. But, IMHO,
such interface is good for many cases.
Of course, I'll provide iterator-based interface too.
> Another comment is performance. For most applications this could be
> crucial.
Yes. Unfortunately, performance of my solution is not ideal. In some "pure
C solutions" (for example, in OpenSSL crypto lib) speed is much faster
(upto 10 times). For example, 1000000 chars std::string encoded in 0.226 s
with my solution, but in 0.020 s - in some C solutions.
But:
1. I am C++ developer, not assembler developer. :-)
2. For some applications speed is not crucial factor. For example, now I
write console application for documents signing by RSA keys. What's the
big difference whether the license document is encoded/decoded in 0.009 s
or in 0.001 s?
3. Performance may be optimized in future.
My main goal is provide ready-to-use and simple-to-use utility based on
existing Boost iterators that can be used after one minute studying. My
solution is not ideal, but it's simple and, IMHO, useful even in present
form.
I personally would be glad to see such utility in Boost.
- Denis
> But:
> 1. I am C++ developer, not assembler developer. :-)
> 2. For some applications speed is not crucial factor. For example,
> now I write console application for documents signing by RSA keys.
> What's the big difference whether the license document is encoded/decoded
> in
> 0.009 s or in 0.001 s?
> 3. Performance may be optimized in future.
This seems to suggest that the implemention currently in
boost is somehow suboptimal. It's all inline code, and the
"parameters" (line length, etc) are all known at compile
time and it's a fairly straight forward algorithm. I would
hope that modern compilers could compile this down
to an almost optimal implemenation. Of course I have
no idea to what extent this is true and I would really
like to know. So I would be curious as to the difference
in time and runtime size for the version generated by
templates and one written by hand. In fact, here's
a great idea for a realistic and useful GSOC project:
"Comparison of different C++ compilers with
optimally written code"
Robert Ramey
>
> Yes, it's a feature, but unfortunately the base64 logic is intertwined with
> the iterator logic, and concerns are not as well separated as they could be.
>
> From a glimpse at the code, I am also not really convinced that it's got
> maximal efficiency either, it could probably be faster by computing at
> compile-time the optimal bit twiddling logic and reducing the redundant
> branches.
>
>
>
> I can't see how another implementation which doesn't use
>> the already made stuff can be faster or more efficient or
>> add less code to the code base.
>>
>
> I have an implementation of a generic system that can generate, from a
> model of a very simple and explicit Converter concept, an iterator adaptor,
> an eager algorithm, a codecvt facet or a boost.iostreams filter.
>
> The idea is basically that I put inside a concept the notion of a
> conversion step, and then I built an iterator adaptor that can work for any
> model of that concept.
>
> A possible implementation of a base64 encoder Converter, along with a
> codecvt demo (hardly the best backend) is available at
> <http://svn.boost.org/svn/**boost/sandbox/SOC/2009/**
> unicode/libs/unicode/example/**base64.cpp<http://svn.boost.org/svn/boost/sandbox/SOC/2009/unicode/libs/unicode/example/base64.cpp>
> >
>
> It's not generic (in the sense that it doesn't allow N to M bit conversion
> like transform_width, it just does base64 -- after all it's only meant to be
> an example) but that's something that would also be fairly easy to do.
>
> Now I am not suggesting that Boost should definitely use this, I'm just
> presenting alternatives.
>
>
Hi Mathias,
I want to write a Converter for *base64_encoder* that works with
Boost.IOStreams.
Do you have a guide ?
Thanks and regards,
Fernando.
> I want to write a Converter for *base64_encoder* that works with
> Boost.IOStreams.
> Do you have a guide ?
Hmmm. ... that's exaclty what the base_64 code in the serialization
library does.
I would think that wouldn't be too hard to use for this porpose.
If one had nothing else to do this code could be used to
generated code_convert facets at compile time in a similar
manner to the way it generates string conversion code
at compile time. I believe that final result would be pretty
useful. It would be useful for all standard i/o streams.
I believe that it would be quite efficient - as I believe
the current library is. Making this as a library of
code convert facets would be a stricky job -as
any library which works with standard code and
real compilers and libraries is.
Robert Ramey
> Fernando Pelliccioni wrote:
> > On Fri, Jun 10, 2011 at 9:51 PM, Mathias Gaunard <
>
> > I want to write a Converter for *base64_encoder* that works with
> > Boost.IOStreams.
> > Do you have a guide ?
>
> Hmmm. ... that's exaclty what the base_64 code in the serialization
> library does.
>
> I would think that wouldn't be too hard to use for this porpose.
>
> If one had nothing else to do this code could be used to
> generated code_convert facets at compile time in a similar
> manner to the way it generates string conversion code
> at compile time. I believe that final result would be pretty
> useful. It would be useful for all standard i/o streams.
> I believe that it would be quite efficient - as I believe
> the current library is. Making this as a library of
> code convert facets would be a stricky job -as
> any library which works with standard code and
> real compilers and libraries is.
>
>
Hi Robert,
I want to use it with IOStreams Filters.
I understood that Mathias has an implementation that works with it.
Something that concerns me about base64 code in serialization is the
following error:
http://lists.boost.org/boost-users/2008/08/39798.php
Is it solved?
Regards,
Fernando.
I'm not sure it's an error. The nature of base64 is that it has to be a
multiple
of 3 bytes. You have to pad it on output. The serialization
library does this as it has to. I don't remember if it does it as part
of the base64 conversion or outside of it.
Is there an iterator adapter for padding ?
base64_from_binary does not pad the bytes.
To know how the serialization library handles
this, I'd have to review the code - but then you
could do this as well as I. I never remember being
an issue so either it got handled automatically or
it was easy to address.
Robert Ramey