Parsing list of IDs with splitText (get rid of new line character)

Johannes Debler

unread,

Jul 19, 2018, 10:42:43 PM7/19/18

to Nextflow

Hi there,

First I create a list of IDs for a given oranism and send the results as a text file into a channel.

process getRNASeqIDs {

    output:
    file 'ids.txt' into RNASeqIDs
    """
esearch -db taxonomy -query '${params.ref}' \
| elink -target sra \
| efetch -format docusm \
| xtract -pattern EXPERIMENT_PACKAGE \
  -if LIBRARY_STRATEGY \
  -equals 'RNA-Seq' \
  -group RUN \
  -element @accession > ids.txt
    """
}

The results text file looks like this:

SRR5040508
SRR5040506
SRR5040505
SRR3999595

Each line represents the ID of an RNAseq dataset.

I then split the channel line by line like this

ids = RNASeqIDs.splitText()

I then want to feed each ID into fastq-dump to download the dataset.

process dumpfastq {
  tag { id }

  input:
    val id from ids

  output:
    set id, "*.fastq" into fastqDumpForAlignment

  """
  fastq-dump $id
  """
}

The problem however is that each ID gets delivered to fastq-dump together with a new line character and fastq-dump crashes because of that.

Is there a better way to parse a list of values in order to feed them individually into another process?

Cheers,

Johannes

Johannes Debler

unread,

Jul 19, 2018, 11:24:27 PM7/19/18

to Nextflow

Ok, for the sake of completeness I solved it like this:

ids = RNASeqIDs.splitText().map{it -> it.trim()}

Steve

unread,

Aug 2, 2018, 10:51:42 AM8/2/18

to Nextflow

use .splitCsv(), it handles the newlines automatically

https://www.nextflow.io/docs/latest/operator.html#splitcsv

a little counter-intuitive but it works perfectly for this in my experience

Zoe L

unread,

Dec 10, 2018, 12:09:56 PM12/10/18

to Nextflow

I spent a long time trying to figure this out. I wanted to get a channel output that could be used as an argument to a script in a process.

this worked to get rid of the new line characters for me using .splitText:

Channel

.fromPath('path/to/file')

.map{ "$it" }

.set{list_of_vals}

Process exp{

input: val (val) from list_of_vals

script:

print "${val}"

}

#output:

#val line 1

#val line 2

#ect

I found that splitCsv produced outputs with [brackets] that was not useful. Can you explain/give a recommendation how to solve that problem?

Steve

unread,

Dec 10, 2018, 4:23:29 PM12/10/18

to Nextflow

Can you send an example of your input data, and the exact output shown in the terminal? I am not sure what brackets you are referring to.

Reply all

Reply to author

Forward