$ awk '$2 > 1 && $3>10 && $6 < 1 && $9 < 1 {print}' star_fusion_SL42404_1.fusioncalls.fusion_candidates.txt
RP11-509J21.2--RP11-509J21.1 2 91 RP11-509J21.2^ENSG00000237359.1 chr9:3647477:+ 0 RP11-509J21.1^ENSG00000232104.2 chr9:3602523:+ 0
2 junction, 91 spanning
$ grep RP11-509J21.2--RP11-509J21.1 star_fusion_SL42404_1.fusioncalls.junction_breakpts_to_genes.txt
chr9 3602522 - chr9 3647478 - 2 1 1 HWI-ST1096:321:C2NR2ACXX:7:2314:9579:80954 3602523 36S62M-40p40M3S 3647442 36M64S RP11-509J21.2^ENSG00000237359.1;chr9:3647477:+;0;RP11-509J21.1^ENSG00000232104.2;chr9:3602523:+;0;RP11-509J21.2^ENSG00000237359.1--RP11-509J21.1^ENSG00000232104.2;RP11-509J21.2--RP11-509J21.1
chr9 3602522 - chr9 3647478 - 2 1 1 HWI-ST1096:321:C2NR2ACXX:6:1314:6163:36682 3602523 77S23M 3647344 100M-43p77M23S RP11-509J21.2^ENSG00000237359.1;chr9:3647477:+;0;RP11-509J21.1^ENSG00000232104.2;chr9:3602523:+;0;RP11-509J21.2^ENSG00000237359.1--RP11-509J21.1^ENSG00000232104.2;RP11-509J21.2--RP11-509J21.1
Here are the CIGAR strings from another junction from the same class with 1 junction read, 14 spanning reads:
> star_fusion_SL35353_1.fusioncalls.junction_breakpts_to_genes.txt
...
... 2 0 3 HWI-ST1096:268:D2EUMACXX:3:2112:12015:59467 6973827 15S61M7037N24M 6984214 100M76p15M85S ...
...
--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.
To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.
To view this discussion on the web visit https://groups.google.com/d/msgid/star-fusion/ca6aa8e2-9204-42a7-8978-e85c2b1e6207%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
another comment - I don't think you're going to find a way to discriminate between trans-splicing and 'genuine' fusion events from translocations without supplementing your RNA-Seq data with DNA-Seq data so you can find DNA-level evidence of chromosomal rearrangements.~b
3602523 36S62M-40p40M3S 3647442 36M64S
3602523 77S23M 3647344 100M-43p77M23S
(assuming the '-' indicates two alternative alignments). It seems like the second example shows an a complementary alignment over a junction 77S23M/77M23S but also shows a perfect matching (100M) for the second position. Would this be an example of a potential paralogy error? Where one read could align perfectly to a gene but because of low entropy, paralogy or shared domain, there is partial homology with another gene resulting in an apparent junction?
Thanks again,
Ben
column 14: CIGAR of the second segment
Unlike standard SAM, both mates are recorded in one line here. The gap of length L between the
mates is marked by the p in the CIGAR string. If the mates overlap, L<0.
For strand denitions, when aligning paired end reads, the sequence of the second mate is reverse
complemented.
I hope this helps,
~b