Parsing list of IDs with splitText (get rid of new line character)

1,173 views
Skip to first unread message

Johannes Debler

unread,
Jul 19, 2018, 10:42:43 PM7/19/18
to Nextflow
Hi there,

First I create a list of IDs for a given oranism and send the results as a text file into a channel.

process getRNASeqIDs {

    output
:
    file
'ids.txt' into RNASeqIDs
   
"""
esearch -db taxonomy -query '${params.ref}' \
| elink -target sra \
| efetch -format docusm \
| xtract -pattern EXPERIMENT_PACKAGE \
  -if LIBRARY_STRATEGY \
  -equals 'RNA-Seq' \
  -group RUN \
  -element @accession > ids.txt
    """

}

The results text file looks like this:

SRR5040508
SRR5040506
SRR5040505
SRR3999595

Each line represents the ID of an RNAseq dataset.

I then split the channel line by line like this

ids = RNASeqIDs.splitText()

I then want to feed each ID into fastq-dump to download the dataset. 

process dumpfastq {
  tag
{ id }

  input
:
    val id
from ids

  output
:
   
set id, "*.fastq" into fastqDumpForAlignment

 
"""
  fastq-dump $id
  """

}

The problem however is that each ID gets delivered to fastq-dump together with a new line character and fastq-dump crashes because of that.

Is there a better way to parse a list of values in order to feed them individually into another process?

Cheers,
Johannes

Johannes Debler

unread,
Jul 19, 2018, 11:24:27 PM7/19/18
to Nextflow
Ok, for the sake of completeness I solved it like this:

ids = RNASeqIDs.splitText().map{it -> it.trim()}


Steve

unread,
Aug 2, 2018, 10:51:42 AM8/2/18
to Nextflow
use .splitCsv(), it handles the newlines automatically
a little counter-intuitive but it works perfectly for this in my experience

Zoe L

unread,
Dec 10, 2018, 12:09:56 PM12/10/18
to Nextflow
I spent a long time trying to figure this out. I wanted to get a channel output that could be used as an argument to a script in a process.

this worked to get rid of the new line characters for me using .splitText:

Channel
       .fromPath('path/to/file')
       .map{ "$it" }
       .set{list_of_vals}

Process exp{
      input: val (val) from list_of_vals
      script:
      print "${val}"
}

#output: 
#val line 1
#val line 2
#ect

I found that splitCsv produced outputs with [brackets] that was not useful. Can you explain/give a recommendation how to solve that problem?

Steve

unread,
Dec 10, 2018, 4:23:29 PM12/10/18
to Nextflow
Can you send an example of your input data, and the exact output shown in the terminal? I am not sure what brackets you are referring to.
Reply all
Reply to author
Forward
0 new messages