Hello this is my first time using methylkit and doing this type of analysis,
The genome of the plant that I'm using is recent and only has a .gff file available and no bed.
Like some other users i converted the gff to bed.
Gff (part):
GRanges object with 619855 ranges and 32 metadata columns:
seqnames ranges strand | source type score phase
<Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer>
[1] chr10_hap2 21926-21928 + | AUGUSTUS start_codon NA 0
[2] chr10_hap2 21926-22417 + | AUGUSTUS CDS 0.97 0
[3] chr10_hap2 21926-23510 + | AUGUSTUS gene 0.16 <NA>
[4] chr10_hap2 21926-23510 + | AUGUSTUS transcript 0.16 <NA>
[5] chr10_hap2 22418-23474 + | AUGUSTUS intron 0.16 <NA>
... ... ... ... . ... ... ... ...
Bed:
chr10_hap2 21925 21928 start_codon 0 +
chr10_hap2 21925 22417 CDS 0 +
chr10_hap2 21925 23510 gene 0 +
chr10_hap2 21925 23510 transcript 0 +
chr10_hap2 22417 23474 intron 0 +
chr10_hap2 23474 23510 CDS 0 +
chr10_hap2 23507 23510 stop_codon 0 +
chr10_hap2 37238 38529 exon 0 +
chr10_hap2 37238 38529 CDS 0 +
chr10_hap2 37238 43934 gene 0 +
chr10_hap2 37238 43934 mRNA 0 +
chr10_hap2 42128 42236 exon 0 +
chr10_hap2 42128 42236 CDS 0 +
chr10_hap2 42320 42529 exon 0 +
chr10_hap2 42320 42529 CDS 0 +
Note: the bed file is not sorted
I've tried to read it directly and I get:
readTranscriptFeatures(my.bedfile)
Reading the table...
Calculating intron coordinates...
Error in `[.data.frame`(ref, , 10) : undefined columns selected
So I've read it using read.bed
>Hap2bed
GRanges object with 619855 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr10_hap2 21926-21928 + | 0 start_codon
[2] chr10_hap2 21926-22417 + | 0 CDS
[3] chr10_hap2 21926-23510 + | 0 gene
[4] chr10_hap2 21926-23510 + | 0 transcript
[5] chr10_hap2 22418-23474 + | 0 intron
... ... ... ... . ... ...
[619851] chr9_hap2 46177969-46178138 + | 0 CDS
[619852] chr9_hap2 46178235-46179068 + | 0 exon
[619853] chr9_hap2 46178235-46179068 + | 0 CDS
[619854] chr9_hap2 46179166-46179767 + | 0 exon
[619855] chr9_hap2 46179166-46179767 + | 0 CDS
-------
seqinfo: 12 sequences from an unspecified genome; no seqlengths
And I get this error:
mygene.parts = readTranscriptFeatures(Hap2bed)
Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "GRanges"’
I also tried to convert first isto a Granageslist but i get a similar error:
my.bed=split(Hap2bed , as.factor(Hap2bed$name))
> mygene.parts = readTranscriptFeatures(my.bed)
Error: unable to find an inherited method for function ‘readTranscriptFeatures’ for signature ‘location = "CompressedGRangesList"
> my.bed
GRangesList object of length 11:
$CDS
GRanges object with 238138 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr10_hap2 21926-22417 + | 0 CDS
[2] chr10_hap2 23475-23510 + | 0 CDS
[3] chr10_hap2 37239-38529 + | 0 CDS
[4] chr10_hap2 42129-42236 + | 0 CDS
[5] chr10_hap2 42321-42529 + | 0 CDS
... ... ... ... . ... ...
[238134] chr9_hap2 46175960-46176182 + | 0 CDS
[238135] chr9_hap2 46176262-46177864 + | 0 CDS
[238136] chr9_hap2 46177969-46178138 + | 0 CDS
[238137] chr9_hap2 46178235-46179068 + | 0 CDS
[238138] chr9_hap2 46179166-46179767 + | 0 CDS
-------
seqinfo: 12 sequences from an unspecified genome; no seqlengths
...
<10 more elements>
I compared the list objects of my file with the one you use on your example:
> names(my.bed)
[1] "CDS" "exon" "five_prime_UTR" "gene" "intron"
[6] "mRNA" "start_codon" "stop_codon" "three_prime_UTR" "transcript"
[11] "tRNA"
> names(gene.parts)
[1] "exons" "introns" "promoters" "TSSes"
Is this only due to not having the same list/objects names? Or Am I also doing the steps wrong?
Best regards,
Sofia