syntax errors when using tee and awk - what is the issue?

166 views
Skip to first unread message

António Miguel de Jesus Domingues

unread,
Nov 25, 2015, 5:50:44 AM11/25/15
to bpipe-discuss

Hi,

I have a working stage in my pipeline that fails when tee+awk is added. This is the working stage (ignore the if statement):

FilterDuplicates = {

output.dir = REMOVE_DUP_OUTDIR

transform(".highQ.fastq.gz") to (".deduped_barcoded.fastq.gz") {

      def SAMPLE_NAME = input.prefix.prefix

      exec """
         if [ -n "\$LSB_JOBID" ]; then
            export TMPDIR=/jobdir/\${LSB_JOBID};
         fi                                          &&

zcat $input | paste -d, - - - - | sort -u -t, -k2,2 | tr ',' '\\n' | gzip > $output

""","FilterDuplicates"
}
}


Basically I am trying to pipe a few steps to avoid temp files and writing to disk. The goal is to make the process faster.  However, whilst doing this I would like to collect some stats, that is line counts, of intermediate steps. To do this, I resorted to tee, which allows stdout to be used by multiple programs, and awk to do the line counting. This is what I came up with, which is working in the shell:


FilterDuplicates = {

output.dir = REMOVE_DUP_OUTDIR

transform(".highQ.fastq.gz") to (".deduped_barcoded.fastq.gz") {

      def SAMPLE_NAME = input.prefix.prefix

      exec """
         if [ -n "\$LSB_JOBID" ]; then
            export TMPDIR=/jobdir/\${LSB_JOBID};
         fi                                          &&

zcat $input | paste -d, - - - - | tee >(awk -v var="$SAMPLE_NAME" 'END {print NR,var}' >> dedup.stats.txt) | sort -u -t, -k2,2 | tee >(awk -v var="$SAMPLE_NAME" 'END {print NR,var}' >> dedup.stats.txt) | tr ',' '\\n' | gzip > $output

""","FilterDuplicates"
}
}


but as soon as the tee+awk statements are added the pipeline fails (actually stalls) with the error message:

=========================== Stage FilterDuplicates [kh-mut-testis-input] ===========================
/tmp/1448447323.240998.shell: line 2: syntax error near unexpected token `('
/tmp/1448447323.240998.shell: line 2: `(if [ -n "$LSB_JOBID" ]; then             export TMPDIR=/jobdir/${LSB
_JOBID};          fi                                          &&                        zcat /local/scratch1
/imb-kettinggr/adomingues/projects/bpipe_small_rna/results/processed_reads/kh-mut-testis-input.highQ.fastq.g
z | paste -d, - - - - | tee >(awk -v var="/local/scratch1/imb-kettinggr/adomingues/projects/bpipe_small_rna/
results/processed_reads/kh-mut-testis-input.highQ" 'END {print NR,var}' >> dedup.stats.txt) | sort -u -t, -k
2,2 | tee >(awk -v var="/local/scratch1/imb-kettinggr/adomingues/projects/bpipe_small_rna/results/processed_
reads/kh-mut-testis-input.highQ" 'END {print NR,var}' >> dedup.stats.txt) | tr ',' '\n' | gzip > /local/scra
tch1/imb-kettinggr/adomingues/projects/bpipe_small_rna/results/processed_reads/kh-mut-testis-input.deduped_b
arcoded.fastq.gz) > .bpipe/commandtmp/758/cmd.out'



It seems that there is some mistake is in the syntax or characters need escaping. A few debbuging attempts, included the removal of the awk command and replacing it with 'echo' statments, resulting in the same error, which sort of excludes awk as the culprit. Escaping the '>(' bit in tee, did not solve the problem though.  So I am posting here in the hope that a fresh pair of eyes can tell what the issue is.

I could of course write those intermediate files, count and remove, but it is not as elegant or efficient. Also, I am curious to what the problem is for further pipelines.

Cheers,
António 

Simon Sadedin

unread,
Nov 26, 2015, 11:09:27 PM11/26/15
to bpipe-discuss on behalf of António Miguel de Jesus Domingues
Hi Antonio,

I'm curious which executor you are using? Is it just the local executor or something else? The answer to the problem may lie in how the executor is wrapping the command when it is sent for execution. I tried running your command using the default "local" executor (direct execution on local machine) and it seemed to work. Would be good if you could try that too, just to narrow it down.

Cheers,

Simon


--
You received this message because you are subscribed to the Google Groups "bpipe-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discus...@googlegroups.com.
To post to this group, send email to bpipe-...@googlegroups.com.
Visit this group at http://groups.google.com/group/bpipe-discuss.
For more options, visit https://groups.google.com/d/optout.

António Miguel de Jesus Domingues

unread,
Nov 27, 2015, 9:30:02 AM11/27/15
to bpipe-discuss
HI Simon,

I am sending it to an LSF queuing system, so I tried a minimal reproducible example with a variation of 'hello world' for debugging. Stage file (hello_tee.txt):

HelloTee = {
 
exec """
  echo 'Hello there' | tee >(echo 'it does tee' > tee.tmp) | grep there
  """

}


Bpipe.run { HelloTee }

executed with:
bpipe run hello_tee.txt

output:
====================================================================================================
|                              Starting Pipeline at 2015-11-27 15:17                               |
====================================================================================================

========================================== Stage HelloTee ==========================================
Hello there

======================================== Pipeline Succeeded ========================================
15:17:26 MSG:  Finished at Fri Nov 27 15:17:26 CET 2015


And the expected 'tee.tmp' file is produced. This confirms that running it locally works as intended. If a minimal bpipe.config is created:
executor="lsf"
queue="short"

We get the same error as before:
====================================================================================================
|                              Starting Pipeline at 2015-11-27 15:22                               |
====================================================================================================

========================================== Stage HelloTee ==========================================
/tmp/1448634144.241511.shell: line 2: syntax error near unexpected token `('
/tmp/1448634144.241511.shell: line 2: `(echo 'Hello there' | tee >(echo 'it does tee' > tee.tmp) | grep there) > .bpipe/commandtmp/2/cmd.out'
^C
Pipeline job running as process 24221.  Terminate? (y/n): y

Terminating process 24221 ...

Cleaning up files from context .bpipe/inprogress/1


The answer to the problem may lie in how the executor is wrapping the command when it is sent for execution. I tried running your command using the default "local" executor (direct execution on local machine) and it seemed to work. 

So it seems that you are on to something :)

Cheers,
António


On Friday, 27 November 2015 05:09:27 UTC+1, Simon wrote:
Hi Antonio,

I'm curious which executor you are using? Is it just the local executor or something else? The answer to the problem may lie in how the executor is wrapping the command when it is sent for execution. I tried running your command using the default "local" executor (direct execution on local machine) and it seemed to work. Would be good if you could try that too, just to narrow it down.

Cheers,

Simon

Reply all
Reply to author
Forward
0 new messages