ERROR: Command failed with exit status = 1 :

254 views
Skip to first unread message

deblac...@gmail.com

unread,
Oct 23, 2017, 6:41:05 AM10/23/17
to jaffa-project
Hi,


need your help please. I don't know what is wrong.
1. hi have encountered error using JAFFA version 10 (last release) with demo data.

2. my command: /media/mdhdd/richard/genfusion/JAFFA_Version_10/tools/bin/bpipe run /media/mdhdd/richard/genfusion/JAFFA_Version_10/JAFFA_hybrid.groovy /media/mdhdd/richard/genfusion/references_Data/demo_data_gz/BT474_data/*.gz

3. My error is: ============================== Stage filter_transcripts (BT474-demo) ===============================
ERROR: Command failed with exit status = 1 :

time /usr/bin/R --vanilla --args BT474-demo/BT474-demo.psl BT474-demo/BT474-demo.txt 1000 /media/mdhdd/richard/genfusion/JAFFA_Version_10/hg38_genCode22.tab < /media/mdhdd/richard/genfusion/JAFFA_Version_10/process_transcriptome_blat_table.R &> /media/mdhdd/richard/genfusion/JAFFA_Version_10/output/BT474-demo/log_filter


========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

Branch BT474-demo in stage Unknown reported message:

Command failed with exit status = 1 :

time /usr/bin/R --vanilla --args BT474-demo/BT474-demo.psl BT474-demo/BT474-demo.txt 1000 /media/mdhdd/richard/genfusion/JAFFA_Version_10/hg38_genCode22.tab < /media/mdhdd/richard/genfusion/JAFFA_Version_10/process_transcriptome_blat_table.R &> /media/mdhdd/richard/genfusion/JAFFA_Version_10/output/BT474-demo/log_filter

Use 'bpipe errors' to see output from failed commands.

4. Results in sample directory where JAFFA ran:
-rw-rw-r-- 1 richard richard 389649 Oct 23 11:01 BT474-demo.fasta
-rw-rw-r-- 1 richard richard 5832186 Oct 23 11:00 BT474-demo_filtered_reads.fastq.1.gz
-rw-rw-r-- 1 richard richard 5960875 Oct 23 11:00 BT474-demo_filtered_reads.fastq.2.gz
-rw-rw-r-- 1 richard richard 69142 Oct 23 11:00 BT474-demo_leftover_reads.fastq.1.gz
-rw-rw-r-- 1 richard richard 70794 Oct 23 11:00 BT474-demo_leftover_reads.fastq.2.gz
-rw-rw-r-- 1 richard richard 932334 Oct 23 11:01 BT474-demo.psl
-rw-rw-r-- 1 richard richard 84 Oct 23 11:01 log_blat
-rw-rw-r-- 1 richard richard 7299 Oct 23 11:01 log_filter
drwxrwxr-x 2 richard richard 4096 Oct 23 11:01 oases

5. directory where JAFFA was installed:
-rwxrwxr-x 1 richard richard 2284 Oct 11 10:50 assemble.sh
-rwxrwxr-x 1 richard richard 2374 Oct 11 10:50 compile_results.R
-rw-rw-r-- 1 richard richard 35147 Oct 11 10:50 COPYING
drwxrwxr-x 2 richard richard 4096 Oct 11 10:50 cwl
drwxrwxr-x 2 richard richard 4096 Oct 11 10:50 docker
-rwxrwxr-x 1 richard richard 1251 Oct 11 10:50 get_spanning_reads_for_direct_1.R
-rwxrwxr-x 1 richard richard 999 Oct 11 10:50 get_spanning_reads_for_direct_2.R
-rwxrwxr-x 1 richard richard 1557 Oct 11 10:50 get_spanning_reads.R
-rw-rw-r-- 1 richard richard 3273481150 Oct 11 10:59 hg38.fa
-rwxrwxr-- 1 richard richard 130890956 Jun 19 2015 hg38_genCode22.1.bt2
-rwxrwxr-- 1 richard richard 74291416 Jun 19 2015 hg38_genCode22.2.bt2
-rwxrwxr-- 1 richard richard 1756646 Jun 19 2015 hg38_genCode22.3.bt2
-rwxrwxr-- 1 richard richard 74291411 Jun 19 2015 hg38_genCode22.4.bt2
-rwxrwxr-- 1 richard richard 322074390 Jun 19 2015 hg38_genCode22.fa
-rwxrwxr-- 1 richard richard 130890956 Jun 19 2015 hg38_genCode22.rev.1.bt2
-rwxrwxr-- 1 richard richard 74291416 Jun 19 2015 hg38_genCode22.rev.2.bt2
-rwxrwxr-- 1 richard richard 42879375 Jun 19 2015 hg38_genCode22.tab
-rwxrwxr-x 1 richard richard 4622 Oct 11 10:50 install_linux64.sh
-rwxrwxr-x 1 richard richard 1171 Oct 11 10:50 JAFFA_assembly.groovy
-rwxrwxr-x 1 richard richard 1447 Oct 11 10:50 JAFFA_direct.groovy
-rwxrwxr-x 1 richard richard 1680 Oct 11 10:50 JAFFA_hybrid.groovy
-rwxrwxr-x 1 richard richard 4138711824 Oct 11 11:36 JAFFA_REFERENCE_FILES_HG38_GENCODE22.tar.gz
-rwxrwxr-x 1 richard richard 24012 Oct 11 10:50 JAFFA_stages.groovy
-rwxrwxr-- 1 richard richard 25463 Oct 23 2014 known_fusions.txt
-rw-rw-r-- 1 richard richard 818 Oct 11 10:50 LICENSE
-rwxrwxr-x 1 richard richard 13893 Oct 11 10:50 make_final_table.R
-rwxrwxr-- 1 richard richard 984246548 Mar 4 2015 Masked_hg38.1.bt2
-rwxrwxr-- 1 richard richard 732332820 Mar 4 2015 Masked_hg38.2.bt2
-rwxrwxr-- 1 richard richard 2698631 Mar 4 2015 Masked_hg38.3.bt2
-rwxrwxr-- 1 richard richard 732332815 Mar 4 2015 Masked_hg38.4.bt2
-rwxrwxr-- 1 richard richard 984246548 Mar 4 2015 Masked_hg38.rev.1.bt2
-rwxrwxr-- 1 richard richard 732332820 Mar 4 2015 Masked_hg38.rev.2.bt2
drwxrwxr-x 4 richard richard 4096 Oct 23 11:00 output
-rwxrwxr-x 1 richard richard 4554 Oct 11 10:50 process_transcriptome_blat_table.R
-rwxrwxr-x 1 richard richard 618 Oct 11 10:50 README
-rw-rw-r-- 1 richard richard 713 Oct 11 10:50 README.md
drwxrwxr-x 11 richard richard 4096 Oct 11 11:33 tools
-rw-rw-r-- 1 richard richard 815 Oct 11 11:33 tools.groovy

6.
a) In log_blat: $ less log_blat
Loaded 297165649 letters in 195175 sequences
Searched 370686 bases in 237 sequences

b) less log_filter

7. System Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Stepping: 2
CPU MHz: 1400.000
BogoMIPS: 4399.81
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 6144K

6. b) log_filter

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>
> library(IRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unlist, unsplit

Loading required package: S4Vectors

Attaching package: ‘IRanges’

The following objects are masked from ‘package:BiocGenerics’:

score, score<-
Warning messages:
1: replacing previous import ‘stats::mad’ by ‘BiocGenerics::mad’ when loading ‘S4Vectors’
2: replacing previous import ‘stats::IQR’ by ‘BiocGenerics::IQR’ when loading ‘S4Vectors’
3: replacing previous import ‘stats::mad’ by ‘BiocGenerics::mad’ when loading ‘IRanges’
4: replacing previous import ‘stats::IQR’ by ‘BiocGenerics::IQR’ when loading ‘IRanges’
5: multiple methods tables found for ‘score’
6: multiple methods tables found for ‘score<-’
7: multiple methods tables found for ‘mad’
8: multiple methods tables found for ‘IQR’
9: multiple methods tables found for ‘unsplit’
10: multiple methods tables found for ‘rep.int
>
> args = commandArgs(trailingOnly = TRUE)
> blat_table=args[1]
> output_file=args[2]
> gap_size=as.numeric(args[3])
>
> MAX_OVERLAP=15 #maximum+1 number of bases that two genes can share at the break-point (<15)
>
> #get the gene names
> ref_to_symbols=read.table(args[4],header=T,comment.char="/",stringsAsFactors=F) #we want to read the first comment line
> symbols<-as.character(ref_to_symbols$name2)
> names(symbols)<-as.character(ref_to_symbols$name)
>
> #get the gene positions
> gene_positions<-data.frame(ref_to_symbols$chrom,ref_to_symbols$txStart,ref_to_symbols$txEnd,ref_to_symbols$name,stringsAsFactors=F)
> unique_genes=match(ref_to_symbols$name,ref_to_symbols$name)
> gene_positions<-unique(gene_positions[unique_genes,])
> rownames(gene_positions)<-as.factor(gene_positions$ref_to_symbols.name)
> gene_positions$ref_to_symbols.name<-NULL
> colnames(gene_positions)<-c("chrom","start","end")
>
> #load the .psl file and split based on de novo transcript ID
> rows<-read.table(file=blat_table,skip=5,stringsAsFactors=F, nrows=5)
> classes<-sapply(rows, class)
> blat_results=read.table(file=blat_table,skip=5,stringsAsFactors=F,colClasses = classes,comment.char="")
>
> split_results=split(1:length(blat_results$V10),blat_results$V10)
>
> getEnsemblTranscriptID <- function(x) {
+ y <- regexpr("ENS(\\w+)?T\\d{11}(.\\d+)?", x, ignore.case=FALSE)
+ substr(x, y, y+attr(y, "match.length")-1)
+ }
>
> #filter out all transcripts which are only covered by one reference transcript
> j<<-1
> multi_gene<-function(x){
+ if(j %% 10 == 0 ) { show(j) } ; j<<-j+1
+
+ #if one reference transcript covered the whole range then return.
+ if(length(x)==1) return()
+ starts=blat_results$V12[x]
+ ends=blat_results$V13[x]
+ maxs=which(ends==max(ends))
+ mins=which(starts==min(starts))
+ if(any(mins %in% maxs)) return()
+
+ #now get just the non-redundant set of transcripts
+ ranges=IRanges(starts,ends)
+ overs=findOverlaps(ranges,type="within",drop.self=TRUE,drop.redundant=TRUE,select="arbitrary")
+ reduced=is.na(overs)
+ regions=ranges[reduced]
+
+ #where are these regions in the genome?
+ genes=blat_results$V14[x[reduced]]
+ # here we are expecting the fasta ids to be in the format: ">hg19_annotation_geneName__otherStuff"
+ gene_names=sapply(genes,getEnsemblTranscriptID)
+ gp=gene_positions[gene_names,]
+ if(!any(!is.na(gp))){ #throw an error if we can't get the genomic range
+ print(paste("Stopping because the gene,",gene_names,"can't be found in",args[4]))
+ stop() }
+
+ #split by chrom
+ split_chroms=split(1:length(gene_names),gp$chrom)
+ #split by region on chromosome
+ new_ranges<-IRanges()
+ for(sc in split_chroms){
+ if(length(sc)==1){
+ new_ranges=c(new_ranges,regions[sc])
+ } else {
+ ir=IRanges(gp$start[sc],gp$end[sc]+gap_size)
+ iru=reduce(ir)
+ overlaps=findOverlaps(ir,iru,select="arbitrary")
+ for(w in split(sc,overlaps)){ new_ranges=c(new_ranges,reduce(regions[w])) }
+ }
+ }
+ cov=coverage(new_ranges)
+
+ # now look for the hallmark of a fusion. We should see a coverage pattern
+ # of 1 2 1 (1 gene , 2 genes overlap a few bases, then the other gene)
+ covValue=runValue(cov)
+ pattern_match<-function(n){ isTRUE(all.equal(covValue[n:(n+2)],c(1,2,1))) }
+
+ if(length(covValue)>=3){
+ temp_pos=which(sapply( 1:(length(runValue(cov))-2),pattern_match))+1
+ temp_pos=suppressWarnings(temp_pos[which(runLength(cov)[temp_pos]==min(runLength(cov)[temp_pos]))[1]])
+ ov=runLength(cov)[temp_pos] #how big is the overlap between genes
+ if(!is.na(ov) & ov<MAX_OVERLAP & ov>0){ #when two gene overlap by more than 14 bases it probably an assembler mistake
+ # set the fusion point to the first base which has
+ # a coverage of 2 ??
+ base_before=sum(runLength(cov)[1:(temp_pos-1)])
+ base_after=sum(runLength(cov)[1:temp_pos])+1
+
+ rbr=blat_results[x[reduced],]
+
+ get_gene_symbol_at_base<-function(b){
+ g=which( (rbr$V12 < b) & (b < rbr$V13 ))
+ g=g[which(rbr[g,]$V1==max(rbr[g,]$V1))[1]]
+ symbols[gene_names[g]]
+ }
+ fusion_genes<-paste(get_gene_symbol_at_base(base_before),
+ get_gene_symbol_at_base(base_after),sep=":")
+
+ show(rbr)
+ show(cov)
+ return(data.frame(rbr$V10[1],base_before,base_after,fusion_genes,rbr$V11[1]))
+ }
+ }
+ }
> candidates=do.call("rbind",lapply(split_results,multi_gene))
Error in .local(query, subject, maxgap, minoverlap, type, select, ...) :
unused arguments (drop.self = TRUE, drop.redundant = TRUE)
Calls: do.call ... lapply -> lapply -> FUN -> findOverlaps -> findOverlaps
Execution halted









Nadia Davidson

unread,
Oct 24, 2017, 7:50:44 PM10/24/17
to jaffa-project
Hi,

Thanks for all the info that's a great help!

It looks to me like it's an issue with compatibility between your R IRanges version and the version of JAFFA. A few other people have reported a similar issue. Can you try installing JAFFA version 1.08 (no need to reinstall the reference files), and then rerun the demo in a different directory from the start? 

If that still fails, would you mind loading R and getting the version if IRanges you have installed:
type "R" on the command line, followed by "library(IRanges)" and then "sessionInfo()" within R and post the output.

Thanks.

Cheers,
Nadia.

deblac...@gmail.com

unread,
Nov 10, 2017, 5:32:56 AM11/10/17
to jaffa-project
Hello Nadia,

I did as you asked, and it works perfectly, But I don't know if this two versions can give the same results. Please can you something about it ? and may be JAFFA version 09 also. I will really appreciate.

Thank you for your great help
best regards

deblac...@gmail.com

unread,
Nov 10, 2017, 6:09:28 AM11/10/17
to jaffa-project
sorry

and may be JAFFA version 09 also*, I think JAFFA version 1.09

Nadia Davidson

unread,
Nov 29, 2017, 8:56:36 PM11/29/17
to jaffa-project
We haven't changed anything in the code between these versions that should affect fusion calls.

veronic...@gmail.com

unread,
Mar 19, 2018, 3:40:23 PM3/19/18
to jaffa-project
Hi Nadia,
I came across the same mistake:
I am using java -1.8.0_65 and the output of sessionInfo() of R is as follows:

    colMeans, colSums, expand.grid, rowMeans, rowSums


> sessionInfo()

R version 3.3.2 (2016-10-31)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)


locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       


attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets 

[8] methods   base     


other attached packages:

[1] IRanges_2.8.1 S4Vectaors_0.12.1 BiocGenerics_0.20.0


Thanks,

Tianshi


Reply all
Reply to author
Forward
0 new messages