Which leads me to three questions:1) What is the "extdata" used in the Vignette? I did not use it to create "file.list" (even though specified in the Vignette, because I would get an error), but I did use it to create "gene.obj".
2) How is the "refseq.hg18.bed.txt" different than "hg18" in the vignette? What is the .bed.txt file? I have the reference genome "refpdom.fasta", which I used to create "myobj", but I do not know whether or how to create the .bed.txt file (which is why I am not surprised about this error).
3) Is it problematic that my control and variable information is of different length? For instance, my two input files are "queenDataHasC.txt" and "workerDataHasC.txt". One file is 2,521,760 lines long, the other is 2,674,819 lines long. (They were originally 58,828,763 and 57,901,315 lines long. However, for each file, I removed lines with freqC = 0).
Hello:I am using methylKit to compare wasp (P.dominula) methylation data, with queens as the variable and workers as the control. I am following the online methylKit vignette. I have a few questions, and would be very grateful to hear any advice!I input my data as such:library(methylKit)file.list=list("queenDataHasC.txt", "workerDataHasC.txt")myobj=read(file.list, sample.id=list("queen", "worker"), assembly="refpdom.fasta", treatment=c(0,1), context="CpG")I am able to output tables and histograms for the following commands:getMethylationStats (it is not bimodal, though, it is right-skewed)getCoverageStats (it is unimodal, and not too skewed either direction)Then, I continue through the vignette:library("graphics")meth=unite(myobj, destrand=FALSE)myDiff10p=get.methylDiff(myDiff,difference=10,qvalue=0.01) (I obtain five DMRs)However, when I try to annotate the DMR:gene.obj=read.transcript.features(system.file("extdata", "refpdom.fasta.bed.txt", package = "methylKit"))I get the error:Error in if (grepl("^track", f.line)) (skip = 1) :argument is of length zeroIn addition: Warning message:In file(con, "r") :file("") only supports open = "w+" and open = "w+b": using the former
Why is this? A GFF3 file has one line per feature. Part of an exon, etc. with further (this can vary) annotation that tells you if it is 5' UTR, or CDS, etc. When converting GFF3 -> BED12, all the features for a particular transcript are rolled up into a single line of output. Transcripts will have a unique "ID" by GFF3 specification - and all features that belong to that transcript will have it in the GFF3 file's attributes field (it is usually the first attribute listed).
When you converted just GFF3 -> BED6, each feature was individually converted into a distinct line of BED formatted output, there was no summing up to group the information for the transcript as a whole. "