Specifying samples with YAML in NGLess

12 views
Skip to first unread message

Luis Pedro Coelho

unread,
Jul 18, 2022, 7:27:26 AM7/18/22
to NGLess List
Hi everyone,

I think this is a feature that Ulrike has asked for for a while in some form, but I had never really figured out the best way to provide it, until now.

Basically, in the next version (actually since 10 minutes ago on the git version), you can specify samples using YAML. Maybe it's best shown as an example:

samples:
  sample1:
    - paired:
        - data/Sample1a.1.fq.gz
        - data/Sample1a.2.fq.gz
    - paired:
        - data/Sample1b.1.fq.gz
        - data/Sample1b.2.fq.gz
  sample2:
    - paired:
        - data/Sample2.1.fq.gz
        - data/Sample2.2.fq.gz
    - single:
        - data/Sample2.extra.fq.gz

This is verbose, but it can be very flexible and there are no more limitations or ambiguities on what the paths can be. Yaml files can easily be generated by other tools too.


At this point, I would welcome any feedback on the design as once it's released, it will be harder to change. Until v1.5 is officially released, you are welcome to test it out (downloading the static builds from https://github.com/ngless-toolkit/ngless/actions/runs/2690055206), but the design may still change so caveat emptor.


HTH
Luis

Luis Pedro Coelho | Fudan University | https://luispedro.org

Luis Pedro Coelho

unread,
Jul 18, 2022, 2:15:34 PM7/18/22
to NGLess List
The YAML file can contain multiple samples and will also work well with parallel

ngless "1.5"
import "parallel" version "1.1"

samples = load_sample_list("samples.yaml")
input = run_for_all(samples)
...


run_for_all being the simpler version of lock1() that is new in NGLess 1.5 as well:


The biggest difference (compared to lock1()) is that you can only use run_for_all() once in your script.
 
However, then the arguments for collect() are simpler too and less error prone. If for some complicated pipeline you need multiple calls to lock1(), then you can still do it that way, but I think 99% of uses are covered with run_for_all().

Best
Luis

Luis Pedro Coelho | Fudan University | https://luispedro.org



--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages