Thanks for the reply. Yes the sam file is sorted. This is how my flow is
I have output sam files from Bismark.
I removed the duplicates through bismark_deduplicate.
I sorted them as described in the methylkit documentation (unix sort).
I added chr to the chromosome number.
My file looks like this now (50 GB)
HWI-ST1140:146:C45ELACXX:5:1305:20630:83995_1:N:0:TTAGGC 99 chr1 3000086 40 38M = 3000120 134 TTGGTTTTGGGTTTTTTTTTTTTTTTTTTTTTTTTTTT CCCFFFFFHHHHHJJJJJJJJJJJHDDDDDDDDDDDDB NM:i:4 MD:Z:0C4C0C4C26 XM:Z:x....hx....h.......................... XR:Z:CT XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1305:20630:83995_1:N:0:TTAGGC 147 chr1 3000120 40 100M = 3000086 -134 TTTTGGTTGGGAGATTATTGATGATTGTTTTTATTTTTTTAGGGGAAATGGGATTTTTAGTTTATGAATTTGATTTTGATTTAGTTTTGGTATTTGGCAT DDDDDDDDDDDDDEDEDDDDEEDEDDFFFFHHHHJJJJJJJJJJJJIJIHIIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFCCC NM:i:17 MD:Z:6G7C9C2C0C1C5C16C7C0C6C4C0C8C7C0C3T2 XM:Z:..............h.........x..hh.h.....h................h.......hh......x....hx........h.......hx...... XR:Z:GA XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1116:12516:66117_1:N:0:TTAGGC 99 chr1 3000667 42 99M = 3000681 112 GGTTATGTTGTGGATTTATTTTTATTAAATTTTAAAATATTTTTAATTTTTTTTTTTATTTTATTATTGATTAAGTTATTATTAAGTAGAGTATTGTTT @CBDDFFFHHFHHIJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJHFFFEEEEEEEEFDDEDEDDDCDEDEEEEDDDEDDDCDDEDEDD NM:i:13 MD:Z:2C19C6C7C2C12C7C2C5C0C3C3C18C0 XM:Z:..h...................h......h.......h..h............h.......h..h.....hh...h...h..................x XR:Z:CT XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1116:12516:66117_1:N:0:TTAGGC 147 chr1 3000681 42 98M = 3000667 -112 TTTATTTTTATTAAATTTTAAAATATTTTTAATTTTTTTTTTTATTTTATTATTGATTAAGTTATTATTAAGTAGAGTATTGTTTAGTTTTTAGGTGA DDDDDDDDECEDCCEDDDEEEEFDDDDDDDDDDDHJJJJJJJJJJJJIJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJIIJJJJJHHHHHFFFFFCCC NM:i:14 MD:Z:8C6C7C2C12C7C2C5C0C3C3C18C5C0C6 XM:Z:........h......h.......h..h............h.......h..h.....hh...h...h..................x.....hx...... XR:Z:GA XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1207:19464:21081_1:N:0:TTAGGC 99 chr1 3000735 42 99M = 3000824 188 GATTAAGTTATTATTAAGTAGAGTATTGTTTAGTTTTTAGGTGAATGTTGGTTTTTTATTATTTATGTTGTTATTGAAGATTAGTTTTAGTTCGTAGTT BBCFDFFFHHHHHJJJJJHJJJJFIIJJJHJIJHHJJJJJJEHIIJJIIJJIJJJJJJJJJJJJJJJIJJIIJJJJHHHHHHHFBEFFFEEEEDDDDDE NM:i:14 MD:Z:2C0C3C3C18C5C0C13C3C11C13C2C0C5C7 XM:Z:..hh...h...h..................x.....hx.............h...h...........x.............x..hh.....xZ...... XR:Z:CT XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1207:19464:21081_1:N:0:TTAGGC 147 chr1 3000824 42 99M = 3000735 -188 GTTCGTAGTTATTTGAAAGGATGTATGGGAAAATTTTAATATTTTTGTATTTGTTGAGGATTTTTTGTGAGTGATTATATGGTTAATTTTGGAGGATTT DDDDDDDDDDDEEEEEEEFFFFFFHHHHHHIJJJJJJJJIJJJJJJJIHJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFCCC NM:i:8 MD:Z:2C9C10C12C13C9C13C8C15 XM:Z:..xZ........x..........h............h.............x.........h.............h........h............... XR:Z:GA XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:2112:13241:99119_1:N:0:TTAGGC 99 chr1 3000922 42 100M = 3000965 142 TGGTATTGAGAAGAAGGTATATATTTTTTTGTTTTATGATAAAATGTTTTGTAGATATTTATTAAATTTATTTGTTTTATAATTTCGGTTAGTGTTCGTG CCCFFFFFHHHGHJIJJAHJJJIJJJJJJJJHIJJJJJJJJJIJJIIIJJJHIJJJJJJJIJJJJJJJJIJHHHHEHFFFFFFFEEDDDDDDDDDFDDDD NM:i:10 MD:Z:5C18C0C6C15C9C9C8C4C12C4 XM:Z:.....x..................hh......h...............x.........h.........h........h....h..Z.........xZ... XR:Z:CT XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:2112:13241:99119_1:N:0:TTAGGC 147 chr1 3000965 42 99M = 3000922 -142 ATGTTTTGTAGATATTTATTAAATTTATTTGTTTTATAATTTCGGTTAGTGTTCGTGTGTTTTTGTTTAGTTTTTGTTTTTAGGATTTGTTTTTTGGCG DDDDDDDDEEEEEEDDEDEDDEEDDDDDDDDDEEEEDDDAAABFHHFFEJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFCCC NM:i:17 MD:Z:5C9C9C8C4C12C7C1C4C5C2C2C0C5C3C1C4T1 XM:Z:.....x.........h.........h........h....h..Z.........xZ......h.x....x.....x..h..hx.....x...h.h...... XR:Z:GA XG:Z:CT
HWI-ST1140:146:C45ELACXX:5:1103:3371:72689_1:N:0:TTAGGC 163 chr1 3001053 42 99M = 3001055 100 GTCTCTTAATAAAAATATAATCTTAAAATCTCCCAATATTATTTTATAAAATACAATATATACTTTAATCTTTAACAAAATATATTTAATAAATATAAC @BBFDFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJIIIIIJJJJJJJIJFHHHHHHFFFF NM:i:26 MD:Z:7G0G1G1G1G1G1G0G4G2G7G11G1G0G1G4G1G1G4G7G4G1G8G3G1G0G1 XM:Z:X......hh.h.h.h.h.hh....h..h.......x...........h.hh.h....h.h.h....h.......h....h.h........h...h.hh. XR:Z:GA XG:Z:GA
HWI-ST1140:146:C45ELACXX:5:2116:9780:51547_1:N:0:TTAGGC 163 chr1 3001053 24 99M = 3001138 184 ATCTTTTAATAAAAATATAATCTTAAAATCTCCCAATATTATTTTATAAAATACAATATATACTTTAATCTTTAACAAAATATATTTAATAAATATAAC CC@FFFFFHHHGHJJJIIIHIHIIJIIJJGHJIJIGIJJJJIJJJJJJIJJHIJIIIJJJJJDAGIJJIJJJJIIJIIJJJIJIJJJJHHHHGHHFFFF NM:i:28 MD:Z:0G3C2G0G1G1G1G1G1G0G4G2G7G11G1G0G1G4G1G1G4G7G4G1G8G3G1G0G1 XM:Z:x......hh.h.h.h.h.hh....h..h.......x...........h.hh.h....h.h.h....h.......h....h.h........h...h.hh. XR:Z:GA XG:Z:GA
I used the following script
listOfFiles<-list("sorted_A.deduplicated.sam","sorted_B.deduplicated.sam","sorted_C.deduplicated.sam","sorted_D.deduplicated.sam")
myobj=read.bismark(location=listOfFiles,sample.id=list("A","B","C","D"),assembly="mm10",save.folder=path,save.context="CpG",read.context="none",mincov=5,minqual=20,treatment=c(1,1,0,0))
It gives me this error
not enough alignments that pass coverage and phred score thresholds to calculate conversion rates
EXITING....
Error in .local(location, sample.id, assembly, save.folder, save.context, :
Error in methylation calling...
Make sure the file is sorted correctly and it is a legitimate Bismark SAM file
I was getting this error before and through I came to know that I should remove the duplicated alignments only through the bismark script, (which I did now)
I also tried to run it without these parameters mincov=5,minqual=20, but it still gives me the same error.
Any suggestions.
thanks in advance
On Thursday, October 16, 2014 11:18:47 AM UTC+2, Kalyan Kumar Pasumarthy wrote:
Methylkit vignette recommends to remove the headers. Make sure that SAM file is sorted.
Hello,
Is it necessary for the sam file to have header lines?
I mean I sorted the sam output file from bismark and removed the
headers? Is it necessary to have them in the file for read.bismark?
Error
not enough alignments that pass coverage and phred score thresholds to calculate conversion rates
EXITING....
Error in .local(location, sample.id, assembly, save.folder, save.context, :
Error in methylation calling...
Make sure the file is sorted correctly and it is a legitimate Bismark SAM file
thanks
--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.