annotation file in gff3 format

378 views
Skip to first unread message

sigd...@gmail.com

unread,
Feb 24, 2014, 7:30:15 PM2/24/14
to methylkit_...@googlegroups.com
only gff3 format is available for my organism. I converted the gff3 to bed using galaxy web server. my file now looks like 

contig_23646 5095 5506 gene 0 -
contig_23646 5095 5506 mRNA 0 -
contig_23646 5095 5506 CDS 0 -
contig_23646 3511 3922 gene 0 -
contig_23646 3511 3922 mRNA 0 -
contig_23646 3511 3922 CDS 0 -
contig_23646 5095 5506 gene 0 -
contig_23646 5095 5506 mRNA 0 -
contig_23646 5095 5506 CDS 0 -
contig_23646 8258 8669 gene 0 -
contig_23646 8258 8669 mRNA 0 -
contig_23646 8258 8669 CDS 0 -

when I tried read.transcript.features in methylKit I got an error message 

Error in `[.data.frame`(ref, , 10) : undefined columns selected 

I assume this is because I am missing the last columns with intron exon information. I do not necessarily need to know the methylation in reference to intron exon but I wanted to know methylation in reference of TSS and transcription stop site (end of the gene). Is tehre any way I can turn off the function that needs intron/exon info and utilize only the start stop sites? 


Thank you very much!
Sig


Kalyan K Pasumarthy

unread,
Feb 25, 2014, 2:20:38 AM2/25/14
to methylkit_...@googlegroups.com
Dear Sig,

I tried your sample data set on my computer with methylkit 0.9.2 version. I tried reading your sample bed file with 'read.bed' function. It resulted in a empty 'GRanges' obj.

However, when I replaced 'contig_' with 'chr' in the bed file, I got a 'GRanges' object with as many rows as in the bed file. 'Score' and 'name' columns also existed in metadata as follows:

GRanges with 12 ranges and 2 metadata columns:
       seqnames       ranges strand   |     score        name
          <Rle>    <IRanges>  <Rle>   | <integer> <character>
   [1] chr23646 [5096, 5506]      -   |         0        gene
   [2] chr23646 [5096, 5506]      -   |         0        mRNA
   [3] chr23646 [5096, 5506]      -   |         0         CDS


Strangely, in methylkit 0.5.7, Altuna mentioned that there is no need of 'chr' string in the bed file. Practically the 'read.bed' is conflicting the fixed bug.

Another point to note - Your file should have information in the methylkit readable format to identify the TSS. This seems to be possible in 'read.transcript.features' but your file lacks that information to be read. While 'read.bed' can read your file, it can not identify TSS.

One possible way is to create another bed file where you define the TSS and STOP codon (for every gene/mRNA, the first three and last three bases) and read it with 'read.bed' This may not be a perfect answer but a roundabout! Altuna may explain why 'read.bed' still need 'chr' string in bed file.


Regards,
Kalyan


--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/groups/opt_out.

Altuna Akalin

unread,
Feb 26, 2014, 5:18:29 AM2/26/14
to methylkit_...@googlegroups.com
Yes, as Kalyan pointed out read.bed would work if you are only interested in TSS and stop codon. You either read a file that contains those, or you can try to get those out of the GFF after reading in.

read.bed removes the chromosome names that have "_" by default, you can set it not to do that by setting up the second argument to FALSE. 

Best,
Altuna 

bukhar...@gmail.com

unread,
Jul 30, 2014, 4:55:53 AM7/30/14
to methylkit_...@googlegroups.com
Hi Altuna, 

I am trying to use MethylKit for Aphid genome, I have bed file of 6 columns, I tried using read.bed function to get gene.obj object. Which resulted me in GRanges object instead of GRangesList. Whereas it look like "annotate.WithGenicParts" only accepts GRanges.List object. I was wondering if i can pass GRanges object in "annotate.WithGenicParts" function? I have very basic knowledge in R, your help is greatly appreciated. Here is what my problem look like:

> gene.obj = read.bed("Galaxy3-[GFF-to-BED_on_data_2].bed")

Warning message:

In findGeneric(f, parent.frame()) :

  'read' is a formal generic function; S3 methods will not likely be found

> gene.obj

GRanges with 323305 ranges and 2 metadata columns:

           seqnames           ranges strand   |     score            name

              <Rle>        <IRanges>  <Rle>   | <integer>     <character>

       [1] GL349624 [310947, 311690]      +   |         0            gene

       [2] GL349624 [310947, 311690]      +   |         0            mRNA

       [3] GL349624 [310884, 310967]      +   |         0  five_prime_UTR

       [4] GL349624 [311086, 311112]      +   |         0  five_prime_UTR

       [5] GL349624 [311113, 311285]      +   |         0             CDS

       ...      ...              ...    ... ...       ...             ...

  [323301] GL349929 [179986, 180981]      -   |         0  five_prime_UTR

  [323302] GL349929 [179715, 179985]      -   |         0             CDS

  [323303] GL349929 [179009, 179621]      -   |         0             CDS

  [323304] GL349929 [177549, 178666]      -   |         0 three_prime_UTR

  [323305] GL349929 [178667, 178670]      -   |         0             CDS

  ---

  seqlengths:

   GL349621 GL349622 GL349623 GL349624 ... GL373500 GL373518 GL373533 GL373541

         NA       NA       NA       NA ...       NA       NA       NA       NA

> annotate.WithGenicParts(myDiff25p,gene.obj)

Error in (function (classes, fdef, mtable)  : 

  unable to find an inherited method for function ‘annotate.WithGenicParts’ for signature ‘"methylDiff", "GRanges"’


Regards,
Kalyan


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

Kalyan K Pasumarthy

unread,
Jul 30, 2014, 6:00:11 AM7/30/14
to methylkit_...@googlegroups.com
Hi,

"read.bed" will give only a "GRanges" object and the function to annotate (annotate.WithGenicParts) needs a GRangesList.

I am assuming that you want to perform annotation of the differentially methylated CpGs using the bed file you have. Your bed file though it has all the features labelled in the fifth column (eg: gene, mRNA,CDS), they are in the same file.  The following steps will help you in annotation:
  • You have to parse this file in to multiple bed files (each bed file for each feature ie., gene.bed for genes, mrna.bed for mRNAs.........)
    • on a linux/mac machine u may try the following command
      •  grep "feature_name" input_file > output.bed
  • Once you have all the bed files for each feature, you may read them using the read.bed()into separate R objects. All of these will be GRanges objects
  • Use GRanglesList() function (in some cases GenomicRangesList(). this depends on the number/type of columns of your bed file) to create a GRangesList object from the above GRanges objects
  • The resulting GRangesList object will be compatible for the annotation function!
  • Also note, some warnings/errors may arise when you annotate (not sure) because of your seqnames column(They are not "chr")


Regards,
Kalyan


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.

To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

ansh...@gmail.com

unread,
Aug 7, 2014, 4:27:04 PM8/7/14
to methylkit_...@googlegroups.com, sigd...@gmail.com
Hello, I have same problem and I followed steps listed here. I still get an error message, any help will be appreciated. Please see below :


> annotation.obj <- GRangesList(gene.obj,mRNA.obj,exon.obj,CDS.obj,five.obj,three.obj)

> annotate.WithGenicParts(myDiff25pbaseall, annotation.obj)

Error in (function (classes, fdef, mtable)  : 

  unable to find an inherited method for function ‘countOverlaps’ for signature ‘"GRanges", "NULL"’

> head (annotation.obj)

GRangesList of length 6:

[[1]] 

GRanges with 55589 ranges and 2 metadata columns:

          seqnames               ranges strand   |     score        name

             <Rle>            <IRanges>  <Rle>   | <integer> <character>

      [1]    Chr01       [27355, 28320]      -   |         0        gene

      [2]    Chr01       [58975, 67527]      -   |         0        gene

      [3]    Chr01       [67770, 69968]      +   |         0        gene

      [4]    Chr01       [90152, 95947]      -   |         0        gene

      [5]    Chr01       [90289, 91197]      +   |         0        gene

      ...      ...                  ...    ... ...       ...         ...

  [55585]    Chr20 [47843311, 47845032]      -   |         0        gene

  [55586]    Chr20 [47850624, 47854548]      +   |         0        gene

  [55587]    Chr20 [47863582, 47888358]      -   |         0        gene

  [55588]    Chr20 [47870688, 47870967]      +   |         0        gene

  [55589]    Chr20 [47890889, 47901292]      -   |         0        gene


...

<5 more elements>

---

seqlengths:

 Chr01 Chr02 Chr03 Chr04 Chr05 Chr06 ... Chr15 Chr16 Chr17 Chr18 Chr19 Chr20

    NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA


Kalyan K Pasumarthy

unread,
Aug 8, 2014, 3:03:31 AM8/8/14
to methylkit_...@googlegroups.com
I suspect the following if I have to see this error.
  • I would have created and saved the "GRangesList" object and "diff" objects in a different session (say session 1) with "GenomicRanges" and "methylKit" packages loaded
  • After a break, I would start another session (session 2) by simply loading the saved "diff" obj and "GRangesList" object without explicitly loading the needed packages. If I do like this, "methylKit" package gets loaded automatically with the "diff" object but "GenomicRanges" does not get loaded even if the "GRangesList" object is loaded.
  • In the absence of needed "GenomicRanges" object, I expect this error

Well, this is only one possible way for this error (actually I reproduced the error message by doing above). In case you have already loaded the "GenomicRanges" library and still see the error message, methylKit community would be glad to see the sessionInfo().

Reply all
Reply to author
Forward
0 new messages