Sebastien and I were talking about going even further. The current library is already getting quite large, and we were thinking of splitting it into smaller ones. Here's a proposal:biocaml_base - Very basic types and functions used throughout all other libraries, e.g. Biocaml_internal_pervasives would go here. Virtually all other libraries would depend on this one.biocaml_genomics - Contains modules related to genomics. At this time, that would be the various file format parsers.range (or irange) - Modules related to integer intervals. The biocaml prefix can be omitted here because the modules wouldn't have anything to do specifically with biology.biocaml_app - The command line app could be in a separate repo.For several of the above we need an ocamlfind sub-package biocaml_foo.lwt and biocaml_foo.async. The implementation of these should go in a single biocaml_foo repo, but they should be selectively installable.ocaml_htslib - Given this setup, a binding to htslib should simply be a separate library. Surely we would want an asynchronous interface to this (I haven't looked but hopefully the C API allows that), and thus we would need again ocaml_htslib_lwt and ocaml_htslib_async. In this case, I think the biocaml_ prefix can be omitted. Having a top-level module called "Htslib" is intuitive and accurately represents what this library does. Although, I would hope it still follows Biocaml coding, API, and documentation standards.The overall idea is that we treat "biocaml" as a namespace, and the overall biocaml suite contains all modules from all of the above. We could even provide a "biocaml" library that depends on all of the above. But splitting allows more rapid development on sub-libraries, keeps compilation times smaller, and lets users install less when they really want to.Inevitably, we'll want to reorganize libraries in the future. At some point, the biocaml_genomics library will become so big that we might want to split it further. The idea is that we leave ourselves the option of doing that. We view the union of all modules in all libraries as the stable set of constructs being provided, not the specific sub-libraries.Some details have to be considered:* Several names are inter-related: repo name, opam package name, findlib package name, module names. Dashes are sometimes nicer than underscores, but dashes are problematic in some cases. The Core team decided to go with underscores everywhere for consistency.* Version numbers. Given my comments above, it would make sense for all libraries to have a single version number. However, the experience of Core already proved that doesn't work. Thus, each library should be version numbered separately.Now, for your incubator idea, there are two options:* As you suggest, we can maintain a separate biocaml_incubator library, which is never officially released.* If the desired modules makes sense in one of the above libraries, then you could add it there, and we mark the module as unstable. This is what we do now [1].Any thoughts? If this sounds okay, I'll work on splitting the library early next week. If you want, I can create a biocaml_incubator repo right away and give you push access.-AshishOn Thu, Nov 28, 2013 at 2:17 AM, Philippe Veber <philipp...@gmail.com> wrote:
Hi Ashish,
this is an interesting suggestion, thanks. It made me wonder if we should not have a separate incubator library, for code that is still unstable in its interface but can be worth sharing among us, until it is polished enough to go into the library. It could reside in the same code base next to src/lib, and would be optionally compiled. How does it sound to you?
ph.2013/11/27 Ashish Agarwal <agarw...@gmail.com>We should also consider writing a Ctypes based binding to the new htslib library [1]. At the least, it would allow comparisons with our pure OCaml implementations.--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Let's hold on splitting Biocaml just yet. Let's instead consider it on a case-by-case basis, so the conversation can be less abstract. For example, if we were going to make a binding to a C library, that's a clear candidate for being a separate library.> Could you expand on how you see splitting allow more rapid development?My statement was too broad. Sometimes it would help, and sometimes not.> ocamlfind sub-libraries (biocaml.base, biocaml.genomics etc ...) are enough to limit the number of dependencies a user has to bear. Do you see some downside to using them instead of full-fledged libraries?So, in opam the ocamlfind sub-libraries would get installed optionally depending on what other libraries the user has already installed. That could work, although I find it a bit awkward. I feel the command `opam install foo` should have a clear effect, but now `opam install foo` isn't meaningful by itself. You have to know what else was previously installed to understand what this command does.
Let's hold on splitting Biocaml just yet. Let's instead consider it on a case-by-case basis, so the conversation can be less abstract. For example, if we were going to make a binding to a C library, that's a clear candidate for being a separate library.
> Could you expand on how you see splitting allow more rapid development?My statement was too broad. Sometimes it would help, and sometimes not.> ocamlfind sub-libraries (biocaml.base, biocaml.genomics etc ...) are enough to limit the number of dependencies a user has to bear. Do you see some downside to using them instead of full-fledged libraries?So, in opam the ocamlfind sub-libraries would get installed optionally depending on what other libraries the user has already installed. That could work, although I find it a bit awkward. I feel the command `opam install foo` should have a clear effect, but now `opam install foo` isn't meaningful by itself. You have to know what else was previously installed to understand what this command does.