Hi,
need your help please. I don't know what is wrong.
1. hi have encountered error using JAFFA version 10 (last release) with demo data.
2. my command: /media/mdhdd/richard/genfusion/JAFFA_Version_10/tools/bin/bpipe run /media/mdhdd/richard/genfusion/JAFFA_Version_10/JAFFA_hybrid.groovy /media/mdhdd/richard/genfusion/references_Data/demo_data_gz/BT474_data/*.gz
3. My error is: ============================== Stage filter_transcripts (BT474-demo) ===============================
ERROR: Command failed with exit status = 1 :
time /usr/bin/R --vanilla --args BT474-demo/BT474-demo.psl BT474-demo/BT474-demo.txt 1000 /media/mdhdd/richard/genfusion/JAFFA_Version_10/hg38_genCode22.tab < /media/mdhdd/richard/genfusion/JAFFA_Version_10/process_transcriptome_blat_table.R &> /media/mdhdd/richard/genfusion/JAFFA_Version_10/output/BT474-demo/log_filter
========================================= Pipeline Failed ==========================================
One or more parallel stages aborted. The following messages were reported:
Branch BT474-demo in stage Unknown reported message:
Command failed with exit status = 1 :
time /usr/bin/R --vanilla --args BT474-demo/BT474-demo.psl BT474-demo/BT474-demo.txt 1000 /media/mdhdd/richard/genfusion/JAFFA_Version_10/hg38_genCode22.tab < /media/mdhdd/richard/genfusion/JAFFA_Version_10/process_transcriptome_blat_table.R &> /media/mdhdd/richard/genfusion/JAFFA_Version_10/output/BT474-demo/log_filter
Use 'bpipe errors' to see output from failed commands.
4. Results in sample directory where JAFFA ran:
-rw-rw-r-- 1 richard richard 389649 Oct 23 11:01 BT474-demo.fasta
-rw-rw-r-- 1 richard richard 5832186 Oct 23 11:00 BT474-demo_filtered_reads.fastq.1.gz
-rw-rw-r-- 1 richard richard 5960875 Oct 23 11:00 BT474-demo_filtered_reads.fastq.2.gz
-rw-rw-r-- 1 richard richard 69142 Oct 23 11:00 BT474-demo_leftover_reads.fastq.1.gz
-rw-rw-r-- 1 richard richard 70794 Oct 23 11:00 BT474-demo_leftover_reads.fastq.2.gz
-rw-rw-r-- 1 richard richard 932334 Oct 23 11:01 BT474-demo.psl
-rw-rw-r-- 1 richard richard 84 Oct 23 11:01 log_blat
-rw-rw-r-- 1 richard richard 7299 Oct 23 11:01 log_filter
drwxrwxr-x 2 richard richard 4096 Oct 23 11:01 oases
5. directory where JAFFA was installed:
-rwxrwxr-x 1 richard richard 2284 Oct 11 10:50 assemble.sh
-rwxrwxr-x 1 richard richard 2374 Oct 11 10:50 compile_results.R
-rw-rw-r-- 1 richard richard 35147 Oct 11 10:50 COPYING
drwxrwxr-x 2 richard richard 4096 Oct 11 10:50 cwl
drwxrwxr-x 2 richard richard 4096 Oct 11 10:50 docker
-rwxrwxr-x 1 richard richard 1251 Oct 11 10:50 get_spanning_reads_for_direct_1.R
-rwxrwxr-x 1 richard richard 999 Oct 11 10:50 get_spanning_reads_for_direct_2.R
-rwxrwxr-x 1 richard richard 1557 Oct 11 10:50 get_spanning_reads.R
-rw-rw-r-- 1 richard richard
3273481150 Oct 11 10:59 hg38.fa
-rwxrwxr-- 1 richard richard 130890956 Jun 19 2015 hg38_genCode22.1.bt2
-rwxrwxr-- 1 richard richard 74291416 Jun 19 2015 hg38_genCode22.2.bt2
-rwxrwxr-- 1 richard richard 1756646 Jun 19 2015 hg38_genCode22.3.bt2
-rwxrwxr-- 1 richard richard 74291411 Jun 19 2015 hg38_genCode22.4.bt2
-rwxrwxr-- 1 richard richard 322074390 Jun 19 2015 hg38_genCode22.fa
-rwxrwxr-- 1 richard richard 130890956 Jun 19 2015 hg38_genCode22.rev.1.bt2
-rwxrwxr-- 1 richard richard 74291416 Jun 19 2015 hg38_genCode22.rev.2.bt2
-rwxrwxr-- 1 richard richard 42879375 Jun 19 2015 hg38_genCode22.tab
-rwxrwxr-x 1 richard richard 4622 Oct 11 10:50 install_linux64.sh
-rwxrwxr-x 1 richard richard 1171 Oct 11 10:50 JAFFA_assembly.groovy
-rwxrwxr-x 1 richard richard 1447 Oct 11 10:50 JAFFA_direct.groovy
-rwxrwxr-x 1 richard richard 1680 Oct 11 10:50 JAFFA_hybrid.groovy
-rwxrwxr-x 1 richard richard
4138711824 Oct 11 11:36 JAFFA_REFERENCE_FILES_HG38_GENCODE22.tar.gz
-rwxrwxr-x 1 richard richard 24012 Oct 11 10:50 JAFFA_stages.groovy
-rwxrwxr-- 1 richard richard 25463 Oct 23 2014 known_fusions.txt
-rw-rw-r-- 1 richard richard 818 Oct 11 10:50 LICENSE
-rwxrwxr-x 1 richard richard 13893 Oct 11 10:50 make_final_table.R
-rwxrwxr-- 1 richard richard 984246548 Mar 4 2015 Masked_hg38.1.bt2
-rwxrwxr-- 1 richard richard 732332820 Mar 4 2015 Masked_hg38.2.bt2
-rwxrwxr-- 1 richard richard 2698631 Mar 4 2015 Masked_hg38.3.bt2
-rwxrwxr-- 1 richard richard 732332815 Mar 4 2015 Masked_hg38.4.bt2
-rwxrwxr-- 1 richard richard 984246548 Mar 4 2015 Masked_hg38.rev.1.bt2
-rwxrwxr-- 1 richard richard 732332820 Mar 4 2015 Masked_hg38.rev.2.bt2
drwxrwxr-x 4 richard richard 4096 Oct 23 11:00 output
-rwxrwxr-x 1 richard richard 4554 Oct 11 10:50 process_transcriptome_blat_table.R
-rwxrwxr-x 1 richard richard 618 Oct 11 10:50 README
-rw-rw-r-- 1 richard richard 713 Oct 11 10:50 README.md
drwxrwxr-x 11 richard richard 4096 Oct 11 11:33 tools
-rw-rw-r-- 1 richard richard 815 Oct 11 11:33 tools.groovy
6.
a) In log_blat: $ less log_blat
Loaded 297165649 letters in 195175 sequences
Searched 370686 bases in 237 sequences
b) less log_filter
7. System Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Stepping: 2
CPU MHz: 1400.000
BogoMIPS: 4399.81
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 6144K
6. b) log_filter
R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
> library(IRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax,
pmax.int, pmin,
pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unlist, unsplit
Loading required package: S4Vectors
Attaching package: ‘IRanges’
The following objects are masked from ‘package:BiocGenerics’:
score, score<-
Warning messages:
1: replacing previous import ‘stats::mad’ by ‘BiocGenerics::mad’ when loading ‘S4Vectors’
2: replacing previous import ‘stats::IQR’ by ‘BiocGenerics::IQR’ when loading ‘S4Vectors’
3: replacing previous import ‘stats::mad’ by ‘BiocGenerics::mad’ when loading ‘IRanges’
4: replacing previous import ‘stats::IQR’ by ‘BiocGenerics::IQR’ when loading ‘IRanges’
5: multiple methods tables found for ‘score’
6: multiple methods tables found for ‘score<-’
7: multiple methods tables found for ‘mad’
8: multiple methods tables found for ‘IQR’
9: multiple methods tables found for ‘unsplit’
10: multiple methods tables found for ‘
rep.int’
>
> args = commandArgs(trailingOnly = TRUE)
> blat_table=args[1]
> output_file=args[2]
> gap_size=as.numeric(args[3])
>
> MAX_OVERLAP=15 #maximum+1 number of bases that two genes can share at the break-point (<15)
>
> #get the gene names
> ref_to_symbols=read.table(args[4],header=T,comment.char="/",stringsAsFactors=F) #we want to read the first comment line
> symbols<-as.character(ref_to_symbols$name2)
> names(symbols)<-as.character(ref_to_symbols$name)
>
> #get the gene positions
> gene_positions<-data.frame(ref_to_symbols$chrom,ref_to_symbols$txStart,ref_to_symbols$txEnd,ref_to_symbols$name,stringsAsFactors=F)
> unique_genes=match(ref_to_symbols$name,ref_to_symbols$name)
> gene_positions<-unique(gene_positions[unique_genes,])
> rownames(gene_positions)<-as.factor(gene_positions$
ref_to_symbols.name)
> gene_positions$
ref_to_symbols.name<-NULL
> colnames(gene_positions)<-c("chrom","start","end")
>
> #load the .psl file and split based on de novo transcript ID
> rows<-read.table(file=blat_table,skip=5,stringsAsFactors=F, nrows=5)
> classes<-sapply(rows, class)
> blat_results=read.table(file=blat_table,skip=5,stringsAsFactors=F,colClasses = classes,comment.char="")
>
> split_results=split(1:length(blat_results$V10),blat_results$V10)
>
> getEnsemblTranscriptID <- function(x) {
+ y <- regexpr("ENS(\\w+)?T\\d{11}(.\\d+)?", x, ignore.case=FALSE)
+ substr(x, y, y+attr(y, "match.length")-1)
+ }
>
> #filter out all transcripts which are only covered by one reference transcript
> j<<-1
> multi_gene<-function(x){
+ if(j %% 10 == 0 ) { show(j) } ; j<<-j+1
+
+ #if one reference transcript covered the whole range then return.
+ if(length(x)==1) return()
+ starts=blat_results$V12[x]
+ ends=blat_results$V13[x]
+ maxs=which(ends==max(ends))
+ mins=which(starts==min(starts))
+ if(any(mins %in% maxs)) return()
+
+ #now get just the non-redundant set of transcripts
+ ranges=IRanges(starts,ends)
+ overs=findOverlaps(ranges,type="within",drop.self=TRUE,drop.redundant=TRUE,select="arbitrary")
+ reduced=
is.na(overs)
+ regions=ranges[reduced]
+
+ #where are these regions in the genome?
+ genes=blat_results$V14[x[reduced]]
+ # here we are expecting the fasta ids to be in the format: ">hg19_annotation_geneName__otherStuff"
+ gene_names=sapply(genes,getEnsemblTranscriptID)
+ gp=gene_positions[gene_names,]
+ if(!any(!
is.na(gp))){ #throw an error if we can't get the genomic range
+ print(paste("Stopping because the gene,",gene_names,"can't be found in",args[4]))
+ stop() }
+
+ #split by chrom
+ split_chroms=split(1:length(gene_names),gp$chrom)
+ #split by region on chromosome
+ new_ranges<-IRanges()
+ for(sc in split_chroms){
+ if(length(sc)==1){
+ new_ranges=c(new_ranges,regions[sc])
+ } else {
+ ir=IRanges(gp$start[sc],gp$end[sc]+gap_size)
+ iru=reduce(ir)
+ overlaps=findOverlaps(ir,iru,select="arbitrary")
+ for(w in split(sc,overlaps)){ new_ranges=c(new_ranges,reduce(regions[w])) }
+ }
+ }
+ cov=coverage(new_ranges)
+
+ # now look for the hallmark of a fusion. We should see a coverage pattern
+ # of 1 2 1 (1 gene , 2 genes overlap a few bases, then the other gene)
+ covValue=runValue(cov)
+ pattern_match<-function(n){ isTRUE(all.equal(covValue[n:(n+2)],c(1,2,1))) }
+
+ if(length(covValue)>=3){
+ temp_pos=which(sapply( 1:(length(runValue(cov))-2),pattern_match))+1
+ temp_pos=suppressWarnings(temp_pos[which(runLength(cov)[temp_pos]==min(runLength(cov)[temp_pos]))[1]])
+ ov=runLength(cov)[temp_pos] #how big is the overlap between genes
+ if(!
is.na(ov) & ov<MAX_OVERLAP & ov>0){ #when two gene overlap by more than 14 bases it probably an assembler mistake
+ # set the fusion point to the first base which has
+ # a coverage of 2 ??
+ base_before=sum(runLength(cov)[1:(temp_pos-1)])
+ base_after=sum(runLength(cov)[1:temp_pos])+1
+
+ rbr=blat_results[x[reduced],]
+
+ get_gene_symbol_at_base<-function(b){
+ g=which( (rbr$V12 < b) & (b < rbr$V13 ))
+ g=g[which(rbr[g,]$V1==max(rbr[g,]$V1))[1]]
+ symbols[gene_names[g]]
+ }
+ fusion_genes<-paste(get_gene_symbol_at_base(base_before),
+ get_gene_symbol_at_base(base_after),sep=":")
+
+ show(rbr)
+ show(cov)
+ return(data.frame(rbr$V10[1],base_before,base_after,fusion_genes,rbr$V11[1]))
+ }
+ }
+ }
> candidates=do.call("rbind",lapply(split_results,multi_gene))
Error in .local(query, subject, maxgap, minoverlap, type, select, ...) :
unused arguments (drop.self = TRUE, drop.redundant = TRUE)
Calls: do.call ... lapply -> lapply -> FUN -> findOverlaps -> findOverlaps
Execution halted