Compatibility of my gff/bed

32 views

Skip to first unread message

Sofia Alves

unread,

Sep 22, 2024, 12:44:11 PM9/22/24

to methylkit_discussion

Hello this is my first time using methylkit and doing this type of analysis,

The genome of the plant that I'm using is recent and only has a .gff file available and no bed.

Like some other users i converted the gff to bed.

Gff (part):

GRanges object with 619855 ranges and 32 metadata columns:
seqnames ranges strand | source type score phase
<Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer>
[1] chr10_hap2 21926-21928 + | AUGUSTUS start_codon NA 0
[2] chr10_hap2 21926-22417 + | AUGUSTUS CDS 0.97 0
[3] chr10_hap2 21926-23510 + | AUGUSTUS gene 0.16 <NA>
[4] chr10_hap2 21926-23510 + | AUGUSTUS transcript 0.16 <NA>
[5] chr10_hap2 22418-23474 + | AUGUSTUS intron 0.16 <NA>
... ... ... ... . ... ... ... ...

Bed:

chr10_hap2 21925 21928 start_codon 0 +
chr10_hap2 21925 22417 CDS 0 +
chr10_hap2 21925 23510 gene 0 +
chr10_hap2 21925 23510 transcript 0 +
chr10_hap2 22417 23474 intron 0 +
chr10_hap2 23474 23510 CDS 0 +
chr10_hap2 23507 23510 stop_codon 0 +
chr10_hap2 37238 38529 exon 0 +
chr10_hap2 37238 38529 CDS 0 +
chr10_hap2 37238 43934 gene 0 +
chr10_hap2 37238 43934 mRNA 0 +
chr10_hap2 42128 42236 exon 0 +
chr10_hap2 42128 42236 CDS 0 +
chr10_hap2 42320 42529 exon 0 +
chr10_hap2 42320 42529 CDS 0 +

Note: the bed file is not sorted

I've tried to read it directly and I get:

readTranscriptFeatures(my.bedfile) Reading the table... Calculating intron coordinates... Error in `[.data.frame`(ref, , 10) : undefined columns selected

So I've read it using read.bed

>Hap2bed
GRanges object with 619855 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr10_hap2 21926-21928 + | 0 start_codon
[2] chr10_hap2 21926-22417 + | 0 CDS
[3] chr10_hap2 21926-23510 + | 0 gene
[4] chr10_hap2 21926-23510 + | 0 transcript
[5] chr10_hap2 22418-23474 + | 0 intron
... ... ... ... . ... ...
[619851] chr9_hap2 46177969-46178138 + | 0 CDS
[619852] chr9_hap2 46178235-46179068 + | 0 exon
[619853] chr9_hap2 46178235-46179068 + | 0 CDS
[619854] chr9_hap2 46179166-46179767 + | 0 exon
[619855] chr9_hap2 46179166-46179767 + | 0 CDS
-------
seqinfo: 12 sequences from an unspecified genome; no seqlengths

And I get this error:

mygene.parts = readTranscriptFeatures(Hap2bed) Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "GRanges"’

I also tried to convert first isto a Granageslist but i get a similar error:

my.bed=split(Hap2bed , as.factor(Hap2bed$name))

> mygene.parts = readTranscriptFeatures(my.bed) Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "CompressedGRangesList"

> my.bed
GRangesList object of length 11:
$CDS
GRanges object with 238138 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr10_hap2 21926-22417 + | 0 CDS
[2] chr10_hap2 23475-23510 + | 0 CDS
[3] chr10_hap2 37239-38529 + | 0 CDS
[4] chr10_hap2 42129-42236 + | 0 CDS
[5] chr10_hap2 42321-42529 + | 0 CDS
... ... ... ... . ... ...
[238134] chr9_hap2 46175960-46176182 + | 0 CDS
[238135] chr9_hap2 46176262-46177864 + | 0 CDS
[238136] chr9_hap2 46177969-46178138 + | 0 CDS
[238137] chr9_hap2 46178235-46179068 + | 0 CDS
[238138] chr9_hap2 46179166-46179767 + | 0 CDS
-------
seqinfo: 12 sequences from an unspecified genome; no seqlengths

...
<10 more elements>

I compared the list objects of my file with the one you use on your example:

> names(my.bed)
[1] "CDS" "exon" "five_prime_UTR" "gene" "intron"
[6] "mRNA" "start_codon" "stop_codon" "three_prime_UTR" "transcript"
[11] "tRNA"

> names(gene.parts) [1] "exons" "introns" "promoters" "TSSes"

Is this only due to not having the same list/objects names? Or Am I also doing the steps wrong?

Best regards,

Sofia

Sofia Alves

unread,

Sep 23, 2024, 3:28:49 PM9/23/24

to methylkit_discussion

I solved it with https://github.com/BIMSBbioinfo/genomation/issues/192, just used my initial gff and had to add a step of converting my differential methylation results to GRanges outside of annotateWithFeatures()

>gff = gffToGRanges(gff.file)
>gfflhap2=as(split(gff, gff$type), "GRangesList")
>CpGGranges<-as(myDiff.tiles3000CpGsig,"GRanges")
>anotaCpG=annotateWithFeatures(CpGGranges, gfflhap2)