I'm wondering how hard and useful it would be to port WvStream to use
STL? I mean removing all these containers and use STL instead. That
means less code to maintain, and given that any C++ program links
against the libstdc++, we already depend on that.
Victims would be WvLinkList, WvVector, WvString, WvHash, and I
probably miss other.
The major use I see is the Iterator "style" as Wv don't use STL-like
iterator but Java-like (or FoundationKit like).
Just some crazy idea I had, coming from 2 sources: the need for a
socket library for AbiCollab (collaborative writing withing AbiWord,
Marc chose boost::asio instead) and the eternal universal
configuration system were UniConf is always listed (and that was
recently on the "portland" list).
Hub
I'm not opposed to the idea in principle (less code to maintain is
always a good thing), but I'm not sure if there's enough motivation to
do so in the absence of a major new project which might use WvStreams.
I'm not too familiar with boost::asio: do you know anything more about
Marc's rationale for using it?
--
William Lachance
wrl...@gmail.com
It works and he already was using boost; and he didn't want to use
socket().
The fact that WvStreams is not very common and brings its own luggage
does not make it a favorite...
Hub
Basically you're talking about having libwvstreams stop depending on
libwvutils. I'm generally okay with this idea, as the ideas behind
libwvutils and libwvstreams were very different: libwvutils was
intended to make C++ *syntax* more sensible, while libwvstreams
defines useful *semantics* around data transport.
The original reasons for avoiding the STL were:
1. Back in the day, libstdc++ was slow, unreliable, and hard to debug.
This has improved hugely due to gcc improvements since the gcc 2.7.2
that we started in.
2. libstdc++ was too big for the Weaver flash disk and is not
*actually* needed to compile C++ programs. However, weaver almost
certainly now includes it anyway, and flash disks are no longer small
or expensive.
3. STL syntax is too template-happy and kind of dumb. This point is
moot since libwvutils syntax is also kind of dumb (and STL's dumbness
is more popular) and we now include template-happy things like
WvCallback which aren't even as good as the Boost versions.
4. STL is not ABI-stable between versions or compiler versions. This
is kind of irrelevant since we've never put in the work to make
WvStreams particularly ABI-stable either. (If we finishes the
inversion and used XPLC a lot, this would improve; heavily using STL
would be a step backwards though.)
5. WvString is dramatically faster for string-copy-heavy operations
such as the ones used in UniConf.
Problem #5 is the tough one, because timings we did relatively
recently (a couple of years ago, and I haven't heard of STL changing
since then) showed *huge* speed gains when using WvString instead of
STL in string-heavy operations. This is about 5x *more* true when
running under Valgrind or Electric Fence.
Why is that? Well, for example, calling a function func(const
std::string &s) with a constant string (eg. func("hello world"))
requires a memory allocation and deallocation with std::string; with
WvString it doesn't. It's hard to express the difference this makes
using mere words, and I've probably lost the statistics. Maybe
someone (pphaneuf?) still has them in his email archives?
The libwvutils containers, however, are probably all significantly
less efficient than the equivalent STL ones, so switching to the STL
should be an all-around improvement.
> The major use I see is the Iterator "style" as Wv don't use STL-like
> iterator but Java-like (or FoundationKit like).
I might suggest switching to STL containers, but then implementing
some wv-like iterators on top of those. I personally greatly prefer
the wv-style iterator syntax.
Also, WvIStreamList might need to keep the old-style container, just
because the WvStreams debugger and crash logger and stuff (all very
handy features!) use the WvList "id" field. Or maybe there's a better
way to do that with STL.
Have fun,
Avery
> 3. STL syntax is too template-happy and kind of dumb. This point is
> moot since libwvutils syntax is also kind of dumb (and STL's dumbness
> is more popular) and we now include template-happy things like
> WvCallback which aren't even as good as the Boost versions.
Boost's "function" and "bind" rock my world. "function" manages to
beat a raw function pointer with a good enough compiler (the spread
across various compilers was ±10% of the overhead of function
pointers, quite good I'd say).
The "bind" that can re-order parameters is pretty damned nifty and
MUCH simpler to use than WvBoundCallback. This returns a function
object that passes its second parameter, 42 and its first parameter to
foo (whatever foo is, a function locally visible or a function object
itself):
bind(foo, _2, 42, _1)
WvBoundCallback could only bind the first one, and had a massively
clunky syntax, you might remember. I'm quite amazed.
> 4. STL is not ABI-stable between versions or compiler versions. This
> is kind of irrelevant since we've never put in the work to make
> WvStreams particularly ABI-stable either. (If we finishes the
> inversion and used XPLC a lot, this would improve; heavily using STL
> would be a step backwards though.)
This is less and less true, as the C++ ABI of gcc stabilises (it still
changes, but very rarely, and in increasingly weird corner cases).
They're good about updating the soname appropriately, and it doesn't
happen too much (it's libstdc++.so.6 on my system, not too offensive).
It seems to have changed from pre-3.0 to 3.0, and again from pre-4.0
to 4.0 of gcc.
I'm not saying it's not true, just that it's not as bad as it might
seem. For example, Ubuntu dapper carries libstdc++.so.5 because of
Acrobat Reader, apparently, and Adobe dared ship binary-only stuff
with a dependency on that. I've never had Acrobat bomb on me because
of a C++ ABI issue that I know of (I use it a fair deal, although less
now that I have a Mac)...
Also, with XPLC, we'd be even more able to cope with code using
differing versions of the libstdc++. A module wouldn't load if the one
it wanted wasn't there, though, that's true. A module could use the
STL internally, but still avoid it in its visible interfaces (I'd
recommend avoiding WvList and others, for exactly the very same
reasons, actually).
We could make the core WvStreams use its own containers internally (a
bit like XPLC does itself) to be super-extra-safe it will *always*
load correctly, while letting the users use whatever they want. And we
don't really use all that much internally, maybe just the list? It's
currently part of the interface, but after the inversion, it wouldn't.
> Why is that? Well, for example, calling a function func(const
> std::string &s) with a constant string (eg. func("hello world"))
> requires a memory allocation and deallocation with std::string; with
> WvString it doesn't. It's hard to express the difference this makes
> using mere words, and I've probably lost the statistics. Maybe
> someone (pphaneuf?) still has them in his email archives?
I don't think I have this around, no... It might be in my former IMAP
account at NITI, but anyway... :-)
Note that the cost is only when the transition is done from the
constant string into the "std::string world", after that it's all
copy-on-write goodness. Another cost on top of the allocation is the
copy of the characters, as well. If I recall, some std::string
implementation keep some memory aside for "small strings", so they end
up just copying it without an allocation, but I don't remember if the
one in gcc does.
The gist was "don't use too many string literals" and "don't call this
thing four million times for each byte", things like that. ;-)
Also, I was thinking about just that the other day, it might be
possible to make an "std::stringparm", given enough crack. Some days,
I think about it, then I start hallucinating and I wake up in the
middle of some fields, it's a pain in the ass. But since I have a
laptop now, it might be possible to code it when I wake up in the
field, who knows? :-P
> The libwvutils containers, however, are probably all significantly
> less efficient than the equivalent STL ones, so switching to the STL
> should be an all-around improvement.
Benchmarks for iteration and appending to std::vector compared to
WvList is in the "WHOA" department. Like, a lot. More than a little by
a bunch. It could possibly more than make up for the std::string
initialization from literal overhead in a real program, I think.
> > The major use I see is the Iterator "style" as Wv don't use STL-like
> > iterator but Java-like (or FoundationKit like).
>
> I might suggest switching to STL containers, but then implementing
> some wv-like iterators on top of those. I personally greatly prefer
> the wv-style iterator syntax.
That's easy. In fact, you'd only need to implement four iterators for
the whole thing: WvIter, WvIterRev and their const equivalents. Since
the whole bloody thing is so generic, I think they could work and
"Wvify" any standard container.
> Also, WvIStreamList might need to keep the old-style container, just
> because the WvStreams debugger and crash logger and stuff (all very
> handy features!) use the WvList "id" field. Or maybe there's a better
> way to do that with STL.
That's really just to walk the global list, which with the inversion
should become a private internal structure, at which point it could
very well be a list<pair<char*, WvStream*> > or something appropriate.
Oh, but it is needed. linking a C++ program with gcc does not work
unless you add -lstdc++. Doing so with g++ it is implicitely added.
Tested with gcc 2.95, 3.3 and 4.x
Here is the sample program:
class Foo
{
};
int main (int argc, char **argv)
{
Foo *foo = new Foo();
delete foo;
return 0;
}
$ g++ main.cpp -o test2
$ ldd ./test2
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x4003b000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0x40125000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x4014d000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x40159000)
/lib/ld-linux.so.2 (0x40000000)
$
Maybe you meant *in the past*.
I clearly know what where the problem *in the past* with STL and C++
support. AbiWord is still carrying the same legacy and I'm trying to
get rid of it.
> 4. STL is not ABI-stable between versions or compiler versions.
It is with major revision. OK, there have been some fsck-up, but 3.x
was stable and 4.x too. Anyway, not really relevant, and the problem
is already there any, see above: we link with libstdc++ and new is
actually dependent on the std::allocator AFAIK.
> 5. WvString is dramatically faster for string-copy-heavy operations
> such as the ones used in UniConf.
You mean passing const string all the way down (WvFastString) ?
Because std::string is implicitly shared and copy-on-write so a simply
copy of std::string instances does not copy the whole payload. That
can be a problem indeed, for the first instanciation. We can still
continue to use WvString. Even better, make WvString compatible with
std::string :-)
> STL in string-heavy operations. This is about 5x *more* true when
> running under Valgrind or Electric Fence.
Timing under Valgrind is not relevant. Timing under EF is biased
because EF put a lot of overhead on the memory allocation.
> > The major use I see is the Iterator "style" as Wv don't use STL-like
> > iterator but Java-like (or FoundationKit like).
>
> I might suggest switching to STL containers, but then implementing
> some wv-like iterators on top of those. I personally greatly prefer
> the wv-style iterator syntax.
But you lose the STL algorithm compatibility. That is part of my
issues with Qt that do similar things. But can possibly be done, at
least to not change every single use of operators.
> Also, WvIStreamList might need to keep the old-style container, just
> because the WvStreams debugger and crash logger and stuff (all very
> handy features!) use the WvList "id" field. Or maybe there's a better
> way to do that with STL.
Good point.
Hub
Replying to myself. -lsupc++ works. It is a static version of the
minimal C++ runtime for the aformentionned case.
Hub
> I'm wondering how hard and useful it would be to port WvStream to use
> STL? I mean removing all these containers and use STL instead. That
> means less code to maintain, and given that any C++ program links
> against the libstdc++, we already depend on that.
That "depends". See further down...
> Just some crazy idea I had, coming from 2 sources: the need for a
> socket library for AbiCollab (collaborative writing withing AbiWord,
> Marc chose boost::asio instead) and the eternal universal
> configuration system were UniConf is always listed (and that was
> recently on the "portland" list).
If I am to use WvStreams, yes, I think doing this would be awesome.
But I'm wondering how much one would want to use WvStreams, nowadays,
if people use boost::asio and it's not insultingly bad, one might just
prefer to use something like that for new programs (especially
considering Boost is to libstdc++ as glibc is to libc, it's pretty
much just there whenever you start doing C++ stuff).
But maybe it's worth doing, if for no other reason than "it'd be fun". :-)
AFAIK, you still can't dlopen() or link with one .so using one version
of libstdc++, and another using another version, can you? The symbols
conflict because of either bugs for "features" in ld.so.
I know this is at least very, very hard to work around. Mono's C# 2.0
compiler can't always link to libraries compiled with the C# 1.x
compiler, because the two both talk to libc and conflict with each
other's libc state management in some way.
> We could make the core WvStreams use its own containers internally (a
> bit like XPLC does itself) to be super-extra-safe it will *always*
> load correctly, while letting the users use whatever they want.
I don't mind having a compiled version of libwvstreams depend on a
particular libstdc++ binary package. That's normal and not worth
avoiding. The problem is more about linking with apps that depend on
a *different* libstdc++ and explode if you end up linked with both.
That's certainly happened to me before.
> And we
> don't really use all that much internally, maybe just the list? It's
> currently part of the interface, but after the inversion, it wouldn't.
WvStreams doesn't use many containers, but UniConf uses several other
things, and we should consider it part of wvstreams, at least in terms
of making decisions like this.
> Note that the cost is only when the transition is done from the
> constant string into the "std::string world", after that it's all
> copy-on-write goodness.
std::string a = "hello";
std::string b = a;
The second line results in a memory allocation. There is no
copy-on-write. That said, passing a parameter as "const std::string
&" obviously avoids the copy as long as a std::string already exists.
> The gist was "don't use too many string literals"
That advice is, frankly, nonsensical. You can't write a modern
program without lots and lots and lots of strings, and if you can't
use literals, strings are ridiculously hard to work with.
> Benchmarks for iteration and appending to std::vector compared to
> WvList is in the "WHOA" department. Like, a lot. More than a little by
> a bunch. It could possibly more than make up for the std::string
> initialization from literal overhead in a real program, I think.
I don't actually care much about wvutils overhead in a normal
situation, just in valgrind. That's because normal situations are
plenty fast enough, and valgrind isn't.
That's not an argument against STL containers - which are generally
all-around just plain faster - but wvutils containers had nothing to
do with uniconf and wvstreams' valgrind slowness anyway (because you
iterate *way* more than you allocate). String memory allocations do.
> That's really just to walk the global list, which with the inversion
> should become a private internal structure, at which point it could
> very well be a list<pair<char*, WvStream*> > or something appropriate.
That's fine. To do the change as simply as possible, you could just
derive WvIStreamList from one of those then.
Note that I'm not actually volunteering to do any of this :)
Have fun,
Avery
Oh, please. My mom told me that if I'm not going to say something
nice about someone, then I shouldn't say anything at all. So instead
I'm just going to link to the "echo/async server example" in their
tutorial and let it speak for itself:
Have fun,
Avery
> Oh, please. My mom told me that if I'm not going to say something
> nice about someone, then I shouldn't say anything at all. So instead
> I'm just going to link to the "echo/async server example" in their
> tutorial and let it speak for itself:
It's weird, becase I don't want to make a final judgement until I know
it better, but with what meager boost::asio experience I have, I could
have written a better and cleaner example?!?
Their server is just plain weird: it allocates a connection in the
constructor, binds it to the callback, and in the callback, allocates
another one and changes its callback again to bind with the new one?!?
I mean, WTF is that? Just bloody allocate it when you get a
connection?!?
A bit of a similar thing in the client, passing state around in the
binded arguments?!? How about, say, keeping a buffer as a session
member? You know?
I mean, all right, I was just saying how cool boost::bind is, but this
guy is fanboy #1 in the universe, no fucking kidding.
Note that translating that example to WvStreams *literally* would
work, and would be an even worse disaster, thanks to WvBoundCallback.
But that's not speaking badly about boost::asio, just about the guy
who wrote this horror.
A *real* question about boost::asio would be, for example, how it
handles buffering on output (short writes are annoying). Also, if it
has buffering like that, whether it does proper flow control and such
(read_requires_writable kind of deal).
*Those* were the really nice features of WvStreams, IMHO. That, and
the base of existing stream classes (encoders, SSL, etc).
Unlike many networking libraries, WvStreams evolved a *lot* based on
real-life coding of lots of different kinds of clients and servers.
asio looks like it was designed for theoretical computer scienciness.
I can't say for sure how asio handles the above, but I'd like to know.
> *Those* were the really nice features of WvStreams, IMHO. That, and
> the base of existing stream classes (encoders, SSL, etc).
Also, we have more alliteration (which ought to be linked from the
WvStreams page on alumnit, I suppose.)
Have fun,
Avery
> AFAIK, you still can't dlopen() or link with one .so using one version
> of libstdc++, and another using another version, can you? The symbols
> conflict because of either bugs for "features" in ld.so.
It's possible, but risky. On an ELF system, it depends on what the
program itself is linked with and the flags used with dlopen(). Stupid
ELF. But once the program is ok, then the modules can go wild. On
Darwin/Mac OS X, it all works correctly (thanks to two-level symbol
tables).
It's not much worse than OpenSSL, I'd say. There's a few sonames
floating around, but hardly ever more than two on a system at a given
time, and usually the older one is there for some wacky old program
you don't feel like upgrading.
> I don't mind having a compiled version of libwvstreams depend on a
> particular libstdc++ binary package. That's normal and not worth
> avoiding. The problem is more about linking with apps that depend on
> a *different* libstdc++ and explode if you end up linked with both.
> That's certainly happened to me before.
True, as a library, we have to be more careful.
> WvStreams doesn't use many containers, but UniConf uses several other
> things, and we should consider it part of wvstreams, at least in terms
> of making decisions like this.
Ah, yes. But again, UniConf has its own rather constrained uses of
containers, which might be a different set than libwvstreams, but we
could just have it do it internally.
> > Note that the cost is only when the transition is done from the
> > constant string into the "std::string world", after that it's all
> > copy-on-write goodness.
>
> std::string a = "hello";
> std::string b = a;
>
> The second line results in a memory allocation. There is no
> copy-on-write. That said, passing a parameter as "const std::string
> &" obviously avoids the copy as long as a std::string already exists.
Not in the libstdc++ I have here. Only the first does an
allocation+copy. After that, it's all good. There's a refcount, and
the documentation for a number of methods specify "Unshares the
string." (doing the equivalent of unique()). But it can't do the
non-mutable string pointing at some user memory trick, so the first
line will definitely allocate and copy.
> > The gist was "don't use too many string literals"
>
> That advice is, frankly, nonsensical. You can't write a modern
> program without lots and lots and lots of strings, and if you can't
> use literals, strings are ridiculously hard to work with.
Well, there's not a whole lot of literals involved in implementing the
HTTP protocol, once you're past the headers, it's pretty much
literal-free. Likewise when you get stuff from the user, and so on.
Literals *in inner loops* aren't all that common, and when you have
one *and* your profiler told you it sucked, you can just put aside a
const std::string and watch it fly.
The part about "in an inner loop" is quite important.
> > Benchmarks for iteration and appending to std::vector compared to
> > WvList is in the "WHOA" department. Like, a lot. More than a little by
> > a bunch. It could possibly more than make up for the std::string
> > initialization from literal overhead in a real program, I think.
>
> I don't actually care much about wvutils overhead in a normal
> situation, just in valgrind. That's because normal situations are
> plenty fast enough, and valgrind isn't.
>
> That's not an argument against STL containers - which are generally
> all-around just plain faster - but wvutils containers had nothing to
> do with uniconf and wvstreams' valgrind slowness anyway (because you
> iterate *way* more than you allocate). String memory allocations do.
In any case, std::string is all nicely refcounted and sharing their
buffers and everything, no worries there, except for the
initialization from literal, which always copies (much like WvString
does, but unlike WvFastString).
> > That's really just to walk the global list, which with the inversion
> > should become a private internal structure, at which point it could
> > very well be a list<pair<char*, WvStream*> > or something appropriate.
>
> That's fine. To do the change as simply as possible, you could just
> derive WvIStreamList from one of those then.
Since it'd be all hidden, we could also *not* derive from it and save
me some grief. Remember, to cook an egg, you do not need to BE a
stove. ;-)
> > A *real* question about boost::asio would be, for example, how it
> > handles buffering on output (short writes are annoying). Also, if it
> > has buffering like that, whether it does proper flow control and such
> > (read_requires_writable kind of deal).
>
> Unlike many networking libraries, WvStreams evolved a *lot* based on
> real-life coding of lots of different kinds of clients and servers.
> asio looks like it was designed for theoretical computer scienciness.
> I can't say for sure how asio handles the above, but I'd like to know.
Indeed. I remember when we added read_requires_writable (piping a
WvFile into the parallel port in WvPrint, heh!). Now, I haven't looked
into asio in enough detail, only at surface. The API seems nice, but
that's the question: how does it do on the important details?
Hence not just going and saying screw this or anything. But asio, as
part of Boost, and that much closer to becoming part of the standard
library one day, is kind of special compared to "many networking
libraries".
> > *Those* were the really nice features of WvStreams, IMHO. That, and
> > the base of existing stream classes (encoders, SSL, etc).
>
> Also, we have more alliteration (which ought to be linked from the
> WvStreams page on alumnit, I suppose.)
There *is* that. :-)
You can get around this fairly easily with the -nostdlib flag (and maybe
one other I'm forgetting) and a stub to define a few simple symbols that
are usually part of libstdc++.
$ g++ main.cpp -o test2 -nostdlib
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8
/tmp/ccEVHnVL.o(.text+0x15): In function `main':
: undefined reference to `operator new(unsigned long)'
/tmp/ccEVHnVL.o(.text+0x22): In function `main':
: undefined reference to `operator delete(void*)'
/tmp/ccEVHnVL.o(.eh_frame+0x11): undefined reference to `__gxx_personality_v0'
collect2: ld returned 1 exit status
distcc[1931] ERROR: compile main.cpp on localhost failed
Er, I guess two of those "simple symbols" were new and delete - bit more
complicated than I remembered, but still doable if you're trying for a very
low-footprint app.
Joe
Forgot all about that. Should've read the rest of the thread.
Joe