Compatibility of my gff/bed

24 views
Skip to first unread message

Sofia Alves

unread,
Sep 22, 2024, 12:44:11 PM9/22/24
to methylkit_discussion
Hello this is my first time using methylkit and doing this type of analysis,

The genome of the plant that I'm using is recent and only has a .gff file available and no bed.
Like some other users i converted the gff to bed.
 Gff (part):
GRanges object with 619855 ranges and 32 metadata columns:
             seqnames            ranges strand |   source        type     score     phase
                <Rle>         <IRanges>  <Rle> | <factor>    <factor> <numeric> <integer>
       [1] chr10_hap2       21926-21928      + | AUGUSTUS start_codon        NA         0
       [2] chr10_hap2       21926-22417      + | AUGUSTUS CDS              0.97         0
       [3] chr10_hap2       21926-23510      + | AUGUSTUS gene             0.16      <NA>
       [4] chr10_hap2       21926-23510      + | AUGUSTUS transcript       0.16      <NA>
       [5] chr10_hap2       22418-23474      + | AUGUSTUS intron           0.16      <NA>
       ...        ...               ...    ... .      ...         ...       ...       ...

Bed:
chr10_hap2 21925 21928 start_codon 0 +
chr10_hap2 21925 22417 CDS 0 +
chr10_hap2 21925 23510 gene 0 +
chr10_hap2 21925 23510 transcript 0 +
chr10_hap2 22417 23474 intron 0 +
chr10_hap2 23474 23510 CDS 0 +
chr10_hap2 23507 23510 stop_codon 0 +
chr10_hap2 37238 38529 exon 0 +
chr10_hap2 37238 38529 CDS 0 +
chr10_hap2 37238 43934 gene 0 +
chr10_hap2 37238 43934 mRNA 0 +
chr10_hap2 42128 42236 exon 0 +
chr10_hap2 42128 42236 CDS 0 +
chr10_hap2 42320 42529 exon 0 +
chr10_hap2 42320 42529 CDS 0 +

Note: the bed file is not sorted
I've tried to read it directly and I get:
readTranscriptFeatures(my.bedfile) Reading the table... Calculating intron coordinates... Error in `[.data.frame`(ref, , 10) : undefined columns selected

So I've read it  using read.bed
>Hap2bed
GRanges object with 619855 ranges and 2 metadata columns:
             seqnames            ranges strand |     score        name
                <Rle>         <IRanges>  <Rle> | <integer> <character>
       [1] chr10_hap2       21926-21928      + |         0 start_codon
       [2] chr10_hap2       21926-22417      + |         0         CDS
       [3] chr10_hap2       21926-23510      + |         0        gene
       [4] chr10_hap2       21926-23510      + |         0  transcript
       [5] chr10_hap2       22418-23474      + |         0      intron
       ...        ...               ...    ... .       ...         ...
  [619851]  chr9_hap2 46177969-46178138      + |         0         CDS
  [619852]  chr9_hap2 46178235-46179068      + |         0        exon
  [619853]  chr9_hap2 46178235-46179068      + |         0         CDS
  [619854]  chr9_hap2 46179166-46179767      + |         0        exon
  [619855]  chr9_hap2 46179166-46179767      + |         0         CDS
  -------
  seqinfo: 12 sequences from an unspecified genome; no seqlengths

And I get this error:
mygene.parts = readTranscriptFeatures(Hap2bed) Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "GRanges"’
I also tried to convert first isto a Granageslist but i get a similar error:
my.bed=split(Hap2bed , as.factor(Hap2bed$name))

> mygene.parts = readTranscriptFeatures(my.bed) Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "CompressedGRangesList"

> my.bed
GRangesList object of length 11:
$CDS
GRanges object with 238138 ranges and 2 metadata columns:
             seqnames            ranges strand |     score        name
                <Rle>         <IRanges>  <Rle> | <integer> <character>
       [1] chr10_hap2       21926-22417      + |         0         CDS
       [2] chr10_hap2       23475-23510      + |         0         CDS
       [3] chr10_hap2       37239-38529      + |         0         CDS
       [4] chr10_hap2       42129-42236      + |         0         CDS
       [5] chr10_hap2       42321-42529      + |         0         CDS
       ...        ...               ...    ... .       ...         ...
  [238134]  chr9_hap2 46175960-46176182      + |         0         CDS
  [238135]  chr9_hap2 46176262-46177864      + |         0         CDS
  [238136]  chr9_hap2 46177969-46178138      + |         0         CDS
  [238137]  chr9_hap2 46178235-46179068      + |         0         CDS
  [238138]  chr9_hap2 46179166-46179767      + |         0         CDS
  -------
  seqinfo: 12 sequences from an unspecified genome; no seqlengths

...
<10 more elements>
I compared the list objects of my file with the one you use on your example:
> names(my.bed)
 [1] "CDS"             "exon"            "five_prime_UTR"  "gene"            "intron"        
 [6] "mRNA"            "start_codon"     "stop_codon"      "three_prime_UTR" "transcript"    
[11] "tRNA"

> names(gene.parts) [1] "exons" "introns" "promoters" "TSSes"

Is this only due to not having the same list/objects names? Or Am I also doing the steps wrong?


Best regards,
Sofia

Sofia Alves

unread,
Sep 23, 2024, 3:28:49 PM9/23/24
to methylkit_discussion
I solved it with https://github.com/BIMSBbioinfo/genomation/issues/192, just used my initial gff and had to add a step of converting my differential methylation results to GRanges outside of annotateWithFeatures()
>gff = gffToGRanges(gff.file)
>gfflhap2=as(split(gff, gff$type), "GRangesList")
>CpGGranges<-as(myDiff.tiles3000CpGsig,"GRanges")
>anotaCpG=annotateWithFeatures(CpGGranges, gfflhap2)

regards
Reply all
Reply to author
Forward
0 new messages