Sebastien and I were talking about going even further. The current library is already getting quite large, and we were thinking of splitting it into smaller ones. Here's a proposal:
biocaml_base - Very basic types and functions used throughout all other libraries, e.g. Biocaml_internal_pervasives would go here. Virtually all other libraries would depend on this one.
biocaml_genomics - Contains modules related to genomics. At this time, that would be the various file format parsers.
range (or irange) - Modules related to integer intervals. The biocaml prefix can be omitted here because the modules wouldn't have anything to do specifically with biology.
biocaml_app - The command line app could be in a separate repo.
For several of the above we need an ocamlfind sub-package biocaml_foo.lwt and biocaml_foo.async. The implementation of these should go in a single biocaml_foo repo, but they should be selectively installable.
ocaml_htslib - Given this setup, a binding to htslib should simply be a separate library. Surely we would want an asynchronous interface to this (I haven't looked but hopefully the C API allows that), and thus we would need again ocaml_htslib_lwt and ocaml_htslib_async. In this case, I think the biocaml_ prefix can be omitted. Having a top-level module called "Htslib" is intuitive and accurately represents what this library does. Although, I would hope it still follows Biocaml coding, API, and documentation standards.
The overall idea is that we treat "biocaml" as a namespace, and the overall biocaml suite contains all modules from all of the above. We could even provide a "biocaml" library that depends on all of the above. But splitting allows more rapid development on sub-libraries, keeps compilation times smaller, and lets users install less when they really want to.
Inevitably, we'll want to reorganize libraries in the future. At some point, the biocaml_genomics library will become so big that we might want to split it further. The idea is that we leave ourselves the option of doing that. We view the union of all modules in all libraries as the stable set of constructs being provided, not the specific sub-libraries.
Some details have to be considered:
* Several names are inter-related: repo name, opam package name, findlib package name, module names. Dashes are sometimes nicer than underscores, but dashes are problematic in some cases. The Core team decided to go with underscores everywhere for consistency.
* Version numbers. Given my comments above, it would make sense for all libraries to have a single version number. However, the experience of Core already proved that doesn't work. Thus, each library should be version numbered separately.
Now, for your incubator idea, there are two options:
* As you suggest, we can maintain a separate biocaml_incubator library, which is never officially released.
* If the desired modules makes sense in one of the above libraries, then you could add it there, and we mark the module as unstable. This is what we do now [1].
Any thoughts? If this sounds okay, I'll work on splitting the library early next week. If you want, I can create a biocaml_incubator repo right away and give you push access.