Using a mapping file to merge sam files

173 views
Skip to first unread message

donke...@gmail.com

unread,
Dec 19, 2017, 10:36:56 PM12/19/17
to Nextflow
I am trying to merge sam files based on a mapping file:

So, I have four sam files:

File1.sam
File2.sam
File3.sam
File4.sam

And I have a mapping_file.txt

File1.sam   SampleA
File2.sam   SampleB
File3.sam   SampleB
File4.sam   SampleB


What I want is to merge files into sample using this mapping file, so that:

SampleA.sam (same as File1.sam)
SampleB.sam (merged from File2.sam, File3.sam, File4.sam)



So far, I have:

Channel.fromPath( file("mapping_file.txt") )
 .splitCsv(sep: "\t")
 .map { row -> tuple(row[1], row[0]) }
 .groupTuple()
 .set { samples_tuple }

process merge_sams {
  
  publishDir "${params.outdir}/merged"
  
  input:
  file "*.sam" from sams
  set(sample, entries) from samples_tuple

  output:
  file "${sample}.sam"

  script:
  """
  samtools merge -f ${sample}.sam ${entries.join(" ")}
  """
}


And this doesn't work, and I don't think I'm even close to getting this right. What would be a working strategy for getting this implemented? Any help and advise would be much appreciated!


Paolo Di Tommaso

unread,
Dec 20, 2017, 4:11:25 AM12/20/17
to nextflow
Instead your a very near to get this right!

The overall approach is right. But: 

1) make sure you field are separated by a tab character (copying & pasting your example it's not) 
2) the `map` should return a tuple in which the second element is a `file`, hence 

  .map { row -> tuple(row[1], file(row[0])) } 

you can even make it a bit more readable expanding the row as 

 .map { path, sample ->  tuple( sample, file(path)) }


3) finally you still need to declare the component types in the input set declaration ie: 

set val(sample),file(entries) from samples_tuple


In this way it should work. 


p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

donke...@gmail.com

unread,
Dec 20, 2017, 5:31:06 AM12/20/17
to Nextflow
Hi Paolo,

Many thanks, but the problem has become a little bit complicate... Essentially the problem is that the tuple is pointing to the wrong location.

For example:

Channel.fromPath( file("mapping_file.txt") )
 .splitCsv(sep: "\t")
 .map { row -> tuple(row[1], row[0]) }
 .groupTuple()
 .subscribe { println it}

I get:

[SampleA, [/Users/james/Project/test_data/File1.sam]]
[SampleB, [/Users/james/Project/test_data/File2.sam, /Users/james/Project/test_data/File3.sam, /Users/james/Project/test_data/File4.sam]]


This is a problem as they are not pointing to the work directory where sams have been collected from a previous process:

toCollectSams
   .collectFile(name: file("*.sam"))
   .set { sams }


Perhaps, the way I use collectFile is not right? I don't know if I've explained well but no doubt your expert hand shall guide the way!

James


FYR Error:

ERROR ~ Error executing process > 'merge_sams (1)'


Caused by:

  Process `merge_sams (1)` terminated with an error exit status (1)


Command executed:


  samtools merge -f SampleA.sam File1.sam


Command exit status:

  1


Command output:

  (empty)


Command error:

  [E::hts_open_format] Failed to open file File1.sam

  samtools merge: fail to open "File1.sam": No such file or directory






To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Dec 20, 2017, 6:04:13 AM12/20/17
to nextflow
File paths need to be absolute. But I was supposing you had that mapping file an input source, instead it looks you are creating to handle the merging process. 

In that case you don't need to that. You can apply the `groupTuple` to the `toCollectSams` channel (provided it include the sample id information). 


If still it's not working please provide a replicate test case. 


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

donke...@gmail.com

unread,
Dec 20, 2017, 7:49:06 AM12/20/17
to Nextflow
Hi Paolo, that's sorted the problem. groupTuple is conceptually a little bit difficult to understand for my limited knowledge... but managed it in the end.

Thank you, and Happy Christmas!
Reply all
Reply to author
Forward
0 new messages