Greetings,
As a veteran C++ programmer, I've been an admirer of the Boost library for many years. I've used it at a number of companies I've worked for, especially the SmartPtr library.
Right now I'm working for a company that worries a lot about negative exposure to Open Source software issues such as questions that might arise about authorship, copyright or even patent issues. The company does allow the use of Open Source software, but it requires that each piece of code that is brought in first be justified and vetted. Unfortunately, I'm finding this to be a nearly impossible task when I look at the amount of code that must be compiled to use the Boost modules I'm interested in.
I've done a study and written some tools, to determine just how many Boost header files must be included to use some of the Boost modules. The numbers are staggering:
Any: 79
FileSystem: 276
Smart Ptr: 382
String Algo: 180
I went on and did a tally of which modules these header files came from. Here are the counts for how many other Boost modules each of these modules depends upon:
Any: 8
FileSystem: 13
SmartPtr: 8
String Algo: 15
Given these numbers, I think I have to abandon any thoughts of using Boost within my current company. There's no way I'm going to get approval to bring so much code into our work just to get a SmartPtr or FIleSystem library. This is unfortunate, because due to Boost's existence there doesn't seem to be much work going on out there to offer lighter weight alternatives.
I'm writing this request in the hopes that there's something I'm missing here, and that someone can point out my folly. Is there a way to utilize any of these Boost modules in such a way that they do not require the inclusion of so much code? If not, does anyone have any suggestions as to how to make this fly with my boss? Has this issue come up before and been considered by the Boost designers? I find the issue baffling.
TIA for any help or insight anyone can provide.
Steve
Greetings,
Right now I'm working for a company that worries a lot about negative exposure to Open Source software issues such as questions that might arise about authorship, copyright or even patent issues. The company does allow the use of Open Source software, but it requires that each piece of code that is brought in first be justified and vetted.
Right now I'm working for a company that worries a lot about negative exposure to Open Source software issues such as questions that might arise about authorship, copyright or even patent issues. The company does allow the use of Open Source software, but it requires that each piece of code that is brought in first be justified and vetted. Unfortunately, I'm finding this to be a nearly impossible task when I look at the amount of code that must be compiled to use the Boost modules I'm interested in.
My statement was what I subjectively felt during hacking sprees when I
know what I want, but can't readily find where it's located when reading
about it in the main body of the documentation.
Going through the libraries, it seems to be mostly a matter of lack of
such detail in the prose non-reference sections, particularly around the
sections that introduce a new concept and for the inline samples, like:
* Bimap tutorial;
* Chrono user's guide;
* Concept;
* Context only has "include all.hpp or the individual headers", with no
reference listing of headers and types at all;
* Filesystem v3 only documents top-level filesystem.hpp and
filesystem/fstream.hpp;
* Iterator - the only mention of headers at all is in the top-level
bullet points for a few of the types;
* Lambda has a brief listing in Installing, but nothing in-line;
* Locale doesn't mention headers by name, but as it's Doxygenish, it
links most names to the header page where they live.
* MSM doesn't even mention any all-header, it just seems to assume you
know exactly what headers to use, and the reference section is flat;
* MPL tutorial;
* NumericConversion's only mention of any headers is in examples for
the sections;
* Parameter has a decent set of cross-referenced docs, once you find it
under section 7 - until that point no mention of the headers exists;
* PointerContainer lacks header mentions in both the tutorial and the
reference, with the latter surprising as the reference doesn't contain
any header information at all, instead having to go to the non-common
'Library headers' section;
* Polygon seems to be completely lacking any header information
whatsoever;
* ProgramOptions - no mention in the tutorial, overview or howto;
* PropertyTree - reasonably straightforward based on the names of the
headers in the reference; prose rarely links typenames to the right
headers;
* Range separates the reference and the header structure; confusing if
you've finally gotten used to information being in the reference
section;
* Test only mentions headers occasionally in samples, I've yet to find
anything standalone talking about includes at all, except for the
unrecommended standalone header;
* Thread lacks an overview of the headers, mentioning them occasionally
in the reference - I just give up and use the all-header;
* Wave has no top-level index, but the few sections mention all headers
by name and synopsis and links to (disabled in Trac) trunk versions of
the headers;
The libraries that use qbk-style documentation tends to be pretty
acceptable, as the common style means that you get a list of headers on
the landing page for the library. While not optimal while reading, it's
at least something.
This might not be a major problem if you're willing to manually
cross-reference the reference sections when reading the
tutorial/guide/introduction sections in which is where the meat of the
documentation usually is located, but it feels quite the wrong way
around to dive into header listings to find the type/concept you want,
when the main body of documentation around it is in the prose.
On an unrelated note, the next time I have to use the expandos in the
Iostreams or Serialization documentation, I'm going to cry. They're such
fiddly and tiny targets, so unfriendly that I think more than once about
using the library due to the inaccessibility of the documentation.
Now that I've insulted practically every single Boost author, let me say
"good job, everyone". The level of documentation in Boost is way better
than what you usually see out in the wild, it's just that it's quite
easy to feel disorientated.
--
Lars Viklund | z...@acc.umu.se
On Sep 3, 2012 2:58 PM, "st...@parisgroup.net" <st...@parisgroup.net> wrote:
>
> Greetings,
>
>
>
> Right now I'm working for a company that worries a lot about negative exposure to Open Source software issues such as questions that might arise about authorship, copyright or even patent issues. The company does allow the use of Open Source software, but it requires that each piece of code that is brought in first be justified and vetted. Unfortunately, I'm finding this to be a nearly impossible task when I look at the amount of code that must be compiled to use the Boost modules I'm interested in.
>
>
Why would open source be more likely to infringe a patent than your own solution? It is less likely in fact since the public availability of code can be used to demonstrate prior art.
Incorporating third party software into your own QA process is often very sensible.
>
> I've done a study and written some tools, to determine just how many Boost header files must be included to use some of the Boost modules. The numbers are staggering:
>
>
>
> Any: 79
>
> FileSystem: 276
>
> Smart Ptr: 382
>
> String Algo: 180
>
>
>
I am not staggered. Additionally the number of files has to be one of the most useless software metrics in the known multiverse!
> I went on and did a tally of which modules these header files came from. Here are the counts for how many other Boost modules each of these modules depends upon:
>
>
>
> Any: 8
>
> FileSystem: 13
>
> SmartPtr: 8
>
> String Algo: 15
>
>
>
> Given these numbers, I think I have to abandon any thoughts of using Boost within my current company. There's no way I'm going to get approval to bring so much code into our work just to get a SmartPtr or FIleSystem library. This is unfortunate, because due to Boost's existence there doesn't seem to be much work going on out there to offer lighter weight alternatives.
>
>
>
> I'm writing this request in the hopes that there's something I'm missing here, and that someone can point out my folly. Is there a way to utilize any of these Boost modules in such a way that they do not require the inclusion of so much code? If not, does anyone have any suggestions as to how to make this fly with my boss? Has this issue come up before and been considered by the Boost designers? I find the issue baffling.
>
The issue simply isnt a development priority. We prefer to emphasise good interfaces to correct code. The internal header file coupling has a potential to increase your build time. Why else do you care (aside from justifying an arbitrary procedure)?
If you have compile time issues please bring these up concretely and with measurements.
>
>
> TIA for any help or insight anyone can provide.
>
>
My recommendation would be to look for another job!
I think all the Boost devs would like to improve modularisation, and efforts are already underway.
Frankly increasing the number of supported deployable configurations is unlikely to improve external quality factors. It is likely to be better to take the better tested ensemble of libraries.
Test and use what you like and leave what you dont.
I may be on my own here, but my suspicion would be that dogmatic enforcement of inappropriate rules is the primary problem.
If you can't trust your developers to include the approved libraries you probably cant trust then to make any change without supervision.
>
> Steve
>
Neil Groves
> I've done a study and written some tools, to determine just how many...
> Boost header files must be included to use some of the Boost modules.
> The numbers are staggering:
This is a legitimate concern which IMO has not been taken seriously by the
Boost community.
To some extent it's unavoidable since it's more efficcient to have one
"best" solution
imported everywhere rather than replecating code.
But it doesn't have to be as bad as it currently is. And it's only getting
worse as more libraries
get added to Boost.
Here's what I suggest for boost users such as yourself:
a) don't use "convenience headers" which suck in all the headers in a
library
rather than just the one's used. It seems tedious because one has to read
more carefully to know which headers to use but it saves huge amounts of
build time and diminishes "dependency surprises" which can cost a lot of
time to track down.
b) a few libraries "infect" your code with "spreading" depencies. Use your
analysis to detect these libraries and complain about them. I have done
this in the past to no effect. It did make me feel I was doing something
though. If you can't get library authors to see this, you'll just have to
avoid
that library or fork your own copy.
c) In your own code, use pre-compiled headers. This speeds up re-builds.
It DOES force you to spend some time looking at how you've
divided up your own code but this time is a good investment.
Here's what I suggest library author's do:
a) Take this gentlement's complaint seriously.
b) Consider eliminating "convenience" headers.
c) When writing documentation avoid depence on "convenience headers".
This seems like it adds some work - less conveniece - but hels addressing
this man's problem.
Here's what Boost can do. Something like:
a) Formalize the comcept of boost library "levels"
i) core - e.g. config, auto-link, BOOST_NONCOPIABLE, ...
ii) utility - e.g. scoped_ptr, ...
iii) application support - e.g serialization
Every boost component would depend only on components at the same or lower
level.
The assignment of library level would be part of the review process.
The motivation here is to support the future growth of Boost and C++
libraries in general.
I see this as the fundamental requirement behind any efforts to achive
"Boost Modularization"
Robert Ramey
Hi again,
I'm the original poster that started this thread. WOW! Thanks for all of the great responses. I apologize for posting this message and then getting called away on a business trip. It is only just now that I'm getting back to see what kind of response I got, and I'm thrilled. I'm happy to see that a number of folks involved with Boost see this issue as a significant problem, if only to certain types of companies.
APOLOGY: I must apologize for a small mistake in my numbers, that might be somewhat important to someone. I managed to reverse the counts for the "Smart Ptr" and "String Algo" libraries. I remember thinking it kinda strange that one referenced more modules but the other referenced more lines. So it's really true that "String Algo" causes 382 files to be read, while "Smart Ptr" causes only 180 to be read. Sorry about that.
I've taken a first pass through all the responses, and rather than respond to each of them individually, I'll offer some more information here and attempt to address address some of the questions that have been pointed back to me.
1) How did I get these numbers. Give some examples.
Here's one of the places I'd love to be shown to be wrong. If my numbers are inflated, my sales job to my boss will be that much easier. So by all means, someone correct me if my approach is unsound.
What I did was very simple. All I did was compile a very simple program and have g++ give me a list of all of the headers it read during the compilation, excluding system headers. This is done using the following line from my test Makefiles:
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< 2> /dev/null -MM > headers.lst
Here's the test for SmartPtr:
#include <iostream>
#include <boost/smart_ptr.hpp>
using namespace std;
int main(int argc, char* argv[])
{
return 0;
}
This simple test produces a file named headers.lst with 180 unique header paths in it, all starting with "boost/".
Discovering the modules used by each module took a few hours of fairly tedious labor, where I sorted and then grouped each list of headers, where each group consisted of headers coming from the same module.
2) Here are the specific module dependencies:
Any: "base", Config, Exception, MPL, Preprocessor, Static Assert, Type Traits, Utility
Filesystem: "base", Config, Exception, Functional, Integer, Iterators, MPL, Preprocessor, Smart Ptr, Static Assert, Type Traits, Utility
SmartPtr "base", Config, Exception, MPL, Preprocessor, Static Assert, Type Traits, Utility
StringAlgo: "base", Bind, Compatibility, Concept Check, Config, Exception, Function, Integer, Iterators, MPL, Preprocessor, Range, Static Assert, String Algo, Type Traits, Utility
3) In response to the suggestion to not use the convenience headers, like say "smart_ptr.hpp" as apposed to a header for an individual header type.
It's bad enough to tell my programmers they can only use certain Boost modules. To tell them that they can only use certain parts of certain Boost modules just gets to be too much. Plus, I can see eventually using most, if not all of the functionality of the SmartPtr module. The same can be said for the other modules I'm interested in. If I have to run to my boss every time I want to use one new particular feature from a Boost module, it's not worth the effort. Nor would it be worth the overhead of figuring out how to police such a level of code use.
So for better or worse, my consideration of the use of Boost has to be on a Module by Module basis.
4) In response to "the license says that it's free to use, and the copyright holders have agreed to that license, so everything is fine".
That's not true in the legal world. Neither the license nor any statements made by the person stating a copyright mean anything if that person somehow, if intentionally or unintentionally, included some bit of someone else's code in what they are calling their own. If the original writer of the code can prove original authorship of the code, nothing done without THAT PERSON'S GRANT OF LICENSE means anything. That original author owns all rights to the use of that code, and can dictate how it can and cannot be used. It is this issue that concerns companies like mine.
5) In response to "Who cares how much code there is. How does one "vet" a piece of code, regardless of how much of it there is".
It is not hard to look at 100 or 1000 lines of code in a few files and say "there's nothing novel here". If the code is all written to do one basic thing or set of things in a direct way, it's pretty easy to believe that a single or a few individuals wrote the code. And, if the claimed authorship is invalid, real damages would easy to justify as being minimal, given the very limited scope of what the code is capable of doing.
It's also much easier to feel comfortable in the fact that many other developers are using these 1000 lines of code in their commercial products and haven't yet been sued over the use of some portion of it. And if/when one wants to upgrade to the next version of a module consisting of 1000 lines of code, it's pretty easy to see what was added/removed.
But in the case of boost, with hundreds and possibly thousands (with fuller adoption) of individual files involved, consisting of tens to hundreds of thousands of lines of code, you can't have any idea what you've got In fact, you can feel fairly confident that all of those lines of code are NOT NECESSARY in the basic sense to the benefit you wish to gain from the module in question. So you have to ask yourself "what more does all this code do?", and you certainly can't read and understand the purpose of every line of such a quantity of code to answer that question And the fact that there's so much of it, leads one to wonder "what novel things might be going on in that code to require so much of it"? I mean, 384 header files being read for a Smart Ptr library is pretty darn "novel" in and of itself.
Finally, there's mere statistics involved. If 1000 lines of code opens a company to a certain amount of negative exposure, 100,000 lines of code, one might argue, opens the company to 100 times as much exposure.
6) And...I'm not sure this question was asked specifically, but I'll ask it myself..."what are you so worried about".
Here's an example of what we're worried about....
Say we develop a tool for Disney to use on one of its feature length films. A month before the premier date of the film, someone takes Disney to court and claims that one of their production tools, the one we wrote, contains code that was stolen from them. Disney asks us to come to court to defend our use of that code.
In the case of 1000 lines, we can say exactly what we did to vet the use of the code, and state exactly what that code does not just for us but for anyone who might use it, pointing out that each of those users has a very clear idea of what the code does, what it's worth to them, and why they considered the copyright given by the supposed author to be valid.
For 100,000 lines of code we say, well the 1% of the code we use kinda/sort works by doing this, but it does that by going off and using bits and pieces of all these other files, and frankly, we couldn't take the time to understand what all that code is for, and therefore can not possibly have understood that the code contained something novel that might have been misrepresented as to its authorship for reason of personal gain on the part of the offending copyright grantor.
In the first case, maybe the judge puts some value on the 1000 lines of code, and because it's Disney, that number gets multiplied by 10X. It's still a small amount of money for Disney, so they pay the money and just decide never to do business with us again.
In the second case, the judge says "wow, there's a lot of code here. This is going to take a lot of time to work out the ramifications of, and to put a dollar amount on" and files an injunction against Disney releasing their film. This costs Disney many millions of dollars on everything they've set in motion in order to release the film, that will now all be wasted money. Disney sues us for all of that money. We, as a very small software company, talk to our lawyer, who tells us our best bet is to fold the company and go find jobs working for Google.
7) Use a more modern C++
Some of our customers are in the Operating System Stone Age. For example, I often develop on Fedora, but my code has to be able to compile and run on Red Hat 5. AND, we are often told exactly what compiler to use, and that compiler sometimes not open source, and in a few cases no longer supported. So solving these issues with newer compilers is not an option.
8) Conclusion
So, it DOES MATTER, IN A BIG WAY how much code I have to bring into my project's codebase to get SmartPtr capabilities. And even if it turned out that it didn't, my boss doesn't consider it worth the risk to make that call. He'd rather hire another programmer just to write a SmartPtr library, so that our project can stay on schedule and he can sleep at night, knowing his company isn't going to some day go "poof" due to a relaxed approach to using Open Source.
Some of our customers don't allow their engineering departments any access to code on the internet for this very reason. There are firewalls designed solely to look for and disallow anything that looks like significant code or other data from coming into the company walls. We have to justify our use of each specific piece of Open Source to EACH OF THESE COMPANIES before we can begin to supply them with anything. So another big issue for us is that as soon as we say "We use Boost", we are dismissed from consideration for a project. I bet this happens all the time.
Thanks All for all the interesting and valuable discourse! Take care!
Steve
PS) My company DOES already use the Boost Smart Ptr library. However, it uses a much earlier version of the library, one that depends on just a dozen or so headers. So I guess at one point individual Boost modules were more separable. Or maybe it was just generally smaller back when that module was adopted. So we DO already have and use Boost Smart Ptrs...we just don't have all the nifty new features in the latest and greatest, the most important of which is the ability to not require that the pointed to class be defined wherever a Smart Ptr is instantiated. I'm dying for that feature.
Thanks, Steve, for your EXCELLENT exposition in point 6 of the issues involved.
Are folks familiar with http://www.blackducksoftware.com/protex ? (I have no interest in Black Duck and have not myself ever used their products or services.)
The “offending” code unfortunately does not even have to come from the Internet:
Jones is working on a software project. He engages his buddy Smith to write portions of the code on a handshake sub-contractor basis. Jones subsequently contributes some of the code to Boost or another open source project, with all of the proper paperwork. Smith probably has a copyright claim on any code that uses the open source project.
Smith is a nice guy and told Jones “he would never sue anybody” but when he sees the name Disney the cash register in his mind goes ca-ching! He rationalizes suing on the basis that Disney (or fill in your favorite corporation) is part of the evil empire.
Charles
From: boost-use...@lists.boost.org [mailto:boost-use...@lists.boost.org] On Behalf Of st...@parisgroup.net
Sent: Wednesday, September 05, 2012 8:05 PM
To: boost...@lists.boost.org
Subject: Re: [Boost-users] Why is there so much co-dependency in Boost? Is there anything to be done about it?
Hi again,
The “offending” code unfortunately does not even have to come from the Internet:
Jones is working on a software project. He engages his buddy Smith to write portions of the code on a handshake sub-contractor basis. Jones subsequently contributes some of the code to Boost or another open source project, with all of the proper paperwork.
Smith probably has a copyright claim on any code that uses the open source project.
Smith is a nice guy and told Jones “he would never sue anybody” but when he sees the name Disney the cash register in his mind goes ca-ching! He rationalizes suing on the basis that Disney (or fill in your favorite corporation) is part of the evil empire.
Here's an example of what we're worried about....
Say we develop a tool for Disney to use on one of its feature length films. A month before the premier date of the film, someone takes Disney to court and claims that one of their production tools, the one we wrote, contains code that was stolen from them. Disney asks us to come to court to defend our use of that code.
Here's an example of what we're worried about....
Say we develop a tool for Disney to use on one of its feature length films. A month before the premier date of the film, someone takes Disney to court and claims that one of their production tools, the one we wrote, contains code that was stolen from them. Disney asks us to come to court to defend our use of that code.
How are you vetting your compiler, OS, etc.?
The difference with a purchased product is that the license usually has some copyright and patent indemnification in it. I’m not being down on open source, it’s just a fact that one of the things you get for your license dollars is IP indemnification. What’s it worth? Your mileage may vary.
What about Linux? Well, Disney (in our example) is probably getting Linux directly from Red Hat or SUSE, not from the Boost-using software vendor, so it’s not his problem. (Yes, there IS a potential problem, as we saw with SCO v IBM.)
Yes, in my earlier note, isn’t Jones in trouble for representing the code as his own? Possibly, but that does not do Disney a whole lot of good. If he is a legally unsophisticated contract programmer he would have a good faith defense: “I paid Smith for the work and so I assumed I owned what he wrote.”
The key is in the last paragraph below. Being able to say “well, what about this factor? Doesn’t that change everything” doesn’t change anything. Disney is still going to have to pay a law firm big bucks to sort it all out, and it likely means the death of the Boost-using Disney software vendor. Again, I’m not trying to bad-mouth Boost or open source. I’m just reciting the sad facts of life in the big city in 2012.
> It would seem that the infringement would be on the 3rd party and not on good-faith users of the 3rd party code.
A safe harbor shield law would be a wonderful thing for the small and open source software community.
Charles
From: boost-use...@lists.boost.org [mailto:boost-use...@lists.boost.org] On Behalf Of Chris Cleeland
Sent: Friday, September 07, 2012 8:18 AM
To: boost...@lists.boost.org
Subject: Re: [Boost-users] Why is there so much co-dependency in Boost? Is there anything to be done about it?
On Fri, Sep 7, 2012 at 9:30 AM, Nevin Liber <ne...@eviloverlord.com> wrote:
> It would seem that the infringement would be on the 3rd party and not on good-faith users of the 3rd party code.
A safe harbor shield law would be a wonderful thing for the small and open source software community.
> Any former software developers in congress?
Nope, all liability lawyers. Are you starting to see a thread here?
Charles
From: boost-use...@lists.boost.org [mailto:boost-use...@lists.boost.org] On Behalf Of Chris Cleeland
Sent: Friday, September 07, 2012 10:35 AM
To: boost...@lists.boost.org
Subject: Re: [Boost-users] Why is there so much co-dependency in Boost? Is there anything to be done about it?
On Fri, Sep 7, 2012 at 11:51 AM, Charles Mills <char...@mcn.org> wrote:
--
Chris Cleeland
What about Linux? Well, Disney (in our example) is probably getting Linux directly from Red Hat or SUSE, not from the Boost-using software vendor, so it’s not his problem. (Yes, there IS a potential problem, as we saw with SCO v IBM.)