Help with Iranges package

144 views
Skip to first unread message

LD

unread,
Jun 22, 2014, 12:23:33 PM6/22/14
to davi...@googlegroups.com
Hi all,

I am having trouble using the Iranges package in R, with my GFF file. I keep getting this error:

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  solving row 10: negative widths are not allowed
In addition: Warning message:
In ans[] <- x :
  number of items to replace is not a multiple of replacement length

Here is a snap shot of my GFF file

GRanges with 6 ranges and 5 metadata columns:
                  seqnames       ranges strand |   source     type     score     phase
                     <Rle>    <IRanges>  <Rle> | <factor> <factor> <numeric> <integer>
  [1] scaffold1_size546071 [ 962, 3991]      + |    maker     gene      <NA>      <NA>
  [2] scaffold1_size546071 [ 962, 3991]      + |    maker     mRNA      <NA>      <NA>
  [3] scaffold1_size546071 [ 962, 3991]      + |    maker     exon      0.99      <NA>
  [4] scaffold1_size546071 [ 962, 3991]      + |    maker      CDS      <NA>         0
  [5] scaffold1_size546071 [5169, 8114]      - |    maker     gene      <NA>      <NA>
  [6] scaffold1_size546071 [5169, 8114]      - |    maker     mRNA      <NA>      <NA>
                                                                                                                                                                          group
                                                                                                                                                                       <factor>
  [1]                              ID=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0;Name=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0;
  [2]           ID=Ptab1_008555.1;Name=Ptab1_008555.1;Parent=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0;_AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|1|1|0|1009
  [3] ID=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0-mRNA-1:exon:1;Name=;Parent=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0-mRNA-1;
  [4]  ID=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0-mRNA-1:cds:1;Name=;Parent=T1_MakerRun3-augustus_masked-scaffold1_size546071-abinit-gene-0.0-mRNA-1;
  [5]                                            ID=T1_MakerRun3-maker-scaffold1_size546071-augustus-gene-0.78;Name=T1_MakerRun3-maker-scaffold1_size546071-augustus-gene-0.78;
  [6]                      ID=Ptab1_010007.1;Name=Ptab1_010007.1;Parent=T1_MakerRun3-maker-scaffold1_size546071-augustus-gene-0.78;_AED=0.07;_eAED=0.01;_QI=0|0|0|1|1|1|2|0|954
  ---
  seqlengths:
     scaffold1_size546071  scaffold10_size368946 ...   scaffold997_size9667   scaffold999_size9659
                       NA                     NA ...                     NA                     NA


Iranges is running through another R script (http://figshare.com/articles/getFeat2_R_function/707325) that is part of a published pipeline.

My GFF imports fine.

I tried to invert the start and end for negative strands but that didn't solve my problem. 

Row 10:
scaffold1_size546071 maker gene 5169 8114 . - . ID=T1_MakerRun5-maker-scaffold1_size546071-augustus-gene-0.67;Name=T1_MakerRun5-maker-scaffold1_size546071-augustus-gene-0.67;

Any help would be greatly appreciated.

Cheers

LD

Vince S. Buffalo

unread,
Jun 22, 2014, 12:38:53 PM6/22/14
to davi...@googlegroups.com
Hi LD,

The issue here is that IRanges doesn't allow negative widths. solve_user_SEW0() is a C routine that takes any two of (start, end, width) of the range and solves for the third. If end < start or width < 0, you'll get this error message. I would throw an assertion before creating the range, e.g. stopifnot(end > start). You might want to see if any of your data are zero-width, and use >=. To make debugging easier, I might also recommend:

if (end < start) browser() # open debugging browser so you can see why this range was created.

Also, you might want to investigate GenomicFeatures. This is a Bioconductor core package that takes a well-formatted GFF and creates a TranscriptDb package containing all transcript-based annotation. It does all of the work for you, creating different tables of exons, splicings, etc. Internally, these TranscriptDb objects contain SQLite databases that you interact with using higher-level R functions. So after creating a TranscriptDb package for your organism's data and loading it, you can get exons grouped by trancript with exonsBy(txdb, "tx") or all introns with introns(txdb).

Great to see people using Bioconductor packages for genomic data — they're great and really well written!

HTH,
Vince


--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

--
Vince Buffalo
Ross-Ibarra Lab (www.rilab.org)
Plant Sciences, UC Davis
Reply all
Reply to author
Forward
0 new messages