Hi,
Since about two weeks, the new elPrep version 3.0 is available at our GitHub repository.
The major change in comparison to previous versions (1.x and 2.x) is that elPrep is now implemented in the Go programming language, whereas it was previously implemented in Common Lisp.
Common Lisp was a very good choice for prototyping and developing elPrep three years ago to the stable version that it has become in the 2.x releases. However, we have also encountered technical issues, especially when it came to memory management. The Go programming language, since end of last year, provides a parallel concurrent garbage collector, which makes it substantially easier to deal with such memory management issues. This convinced us to port elPrep to Go, which has led to some runtime improvements and also makes elPrep easier to use.
Another advantage of the new version is that Go is a more widely used programming language, developed at Google, with a syntax reminiscent of languages like Python which, however, statically compiles to efficient binary code. For example, it is used as an implementation language for Docker and Kubernetes, among other projects. This should make elPrep more accessible to a wider audience. We have already seen first good signs that this indeed pays off, since we have already received pull requests for additional features in the last two weeks.
There are a number of other additional improvements:
- Specifying --nr-of-threads is not necessary anymore. If --nr-of-threads is not passed, elPrep can now figure out how many CPUs/cores are available and use them efficiently on its own.
- The --gc-on option has been removed. elPrep now handles memory efficiently on its own and does not need user guidance anymore.
- There is now a separate remove-duplicates filter, so that elPrep can now remove duplicates that may have already been marked by other tools.
- --mark-duplicates now compares reads using the library id instead of the read group id, matching the behaviour of recent Picard versions.
- --filter-non-exact-mapping-reads and --filter-non-exact-mapping-reads-strict have been added.
- The split/filter/merge scripts have bee changed to be able to handle a directory as input, and to allow simpler specification of names and types of intermediate output files.
- The split/merge tools can now handle single-end data with the --single-end option.
You can find the new elPrep version in its usual place at
https://github.com/ExaScience/elprep (still open source under a BSD license). There are still binaries available for Linux, but Go itself is an open source project - see
https://golang.org - so you can easily also create binaries yourself.
Thanks to Leonor Palmeira and Geert Vandeweyer for bug reports and feature suggestions.
Feel free to ask any questions.
Pascal