Issues with identifying simulated inserts and deletions

121 views
Skip to first unread message

Mariam Okhovat

unread,
Aug 18, 2017, 4:14:04 PM8/18/17
to MELT Help
Hi,

We are hoping to use MELT to find polymorphic transposable elements (TE) among different gibbon individuals. In order to evaluate how well MELT can handle our TE of interest, we have taken the reference genome and randomly inserted the consensus TE element 42 times into the genome. We have also taken 20 of the predicted TEs in the genome (from chromosomes that did not receive inserts) and deleted them (their sequence was masked to "P" using bedtools and then sed was used to cut out all instances of "P"). Then we simulated fastq reads and aligned to the reference and used MELT. 

MELT does a really good job in finding the exact location of the manually inserted TEs, there's a only a minor bug:

1. Some of the predictions are called heterozygous, even though the likelihoods clearly show they should have been called homozygous for non-reference insertion. Eg:
This is a bug, right? Based on the data preparation and the likelihood values, all inserts should be homozygous 1/1.

chr9 108814 . A <INS:ME:LAVA> . PASS TSD=A;ASSESS=5;INTERNAL=ENST00000379589.3,INTRONIC;SVTYPE=LAVA;SVLEN=1839;MEINFO=LAVA,0,1839,+;DIFF=0.11:n1-1666,c1673t,n1694-1697,a1702c,t1703c,c1711g,c1725g,t1729a,c1734t,c1741t,i1806t;LP=10;RP=6;RA=0.737;PRIOR=false GT:GL 0/1:-286,-18.06,-36



2. Unfortunately, MELT is doing really bad in calling the deletions. There are only three predicted deletions and none are correct. They're not even on the right chromosomes. This seems strange to me, especially since it does such a good job in detecting inserts. Is the deletion function  not as powerful as the MEI finder, or am I doing something wrong? Below is the code I used:

for bam in $DIR1/*sorted.bam

do

java -Xmx2g -jar $DIR/MELT.jar Deletion-Genotype -l $bam -w $DIR1 -b $DIR/full-length-LAVA-nomLeu3.sorted.bed -h $DIR/nomLeu3.fasta -e 250

done


ls $DIR1/*.del.tsv > $DIR1/melt.list.of.sim.del.outputs.txt


java -Xmx1G -jar $DIR/MELT.jar Deletion-Merge -b $DIR/full-length-LAVA-nomLeu3.sorted.bed -h $DIR/nomLeu3.fasta -l $DIR1/melt.list.of.sim.del.outputs.txt -o $DIR1/sim.LAVA.final_del_comp.vcf -d 100000



Thank you!
Mariam

egar...@umaryland.edu

unread,
Aug 22, 2017, 4:33:15 AM8/22/17
to MELT Help
Hello Miriam,

Good to hear MELT has worked well for you for non-REF insertions. equally good to hear is it seems to be working well for Gibbon genomes, would be good to know any final results in the regard.

Issue 1:
Genotype likelihoods are listed in the form: Genotype: HOM ref likelihood, HET likelihood, HOM alt likelihood. The likelihood closest to 0 is reported as the genotype. So breaking down the likelihood for this call:

Genotype = 0/1 (het)
HOM ref = -286
HET = -18.06
HOM alt = -36

Thus this insertion is called as het by MELT. There is of course some error in regards to genotyping, and the likelihood model utilized by MELT is biased towards het genotypes (see https://doi.org/10.1093/bioinformatics/btr509 for the genotyping liklihood model). So if the insertion is supposed to be a hom alt this may indeed be the case. However, I would be surprised (especially for a simulated genome...) that MELT wouldn't pick this up as HOM ALT if it was indeed added that way. Could you provide a IGV screenshot or two of the region (with discordant pairs and split reads highlighted) so I could see if there is some issue? 

Issue 2:

Likewise, could you get me a screenshot or two of the region you "deleted" from the genome? While the MELT deletion genotyper is underpowered compared to the nonREF one (as is described in our in press paper at Genome Research), it isn't so underpowered it shouldn't pick up any deletions. Also, what coverage are you simulating to?

Happy to continue discussions to make sure MELT is working for your needs.

Best,

Eugene

Mariam Okhovat

unread,
Aug 22, 2017, 3:44:24 PM8/22/17
to MELT Help
Hi Eugene,

Issue 1: I see...I have attached a snapshot at one of the insert sites. The simulated reads are aligned to the original reference, so one can not see in insert, but color coding of read pairs clearly show the discrepancies. However, looking at the reads it appears that due to a short stretch of "A"s that happened to be at the random insert site and also the end of the TE, there are some reads that align to the insert site as if there were no insert. Maybe that is why MELT identifies it as het.

Issue 2: I have attached a snapshot of one of the deletions. Visually, it seems like everything looks fine and MELT should've been able to call the deletion, but it completely missed it.

The simulated reads are at 20X coverage. Our actual samples vary from 11-48X coverage.

Thanks,
Mariam
chr12-del1.jpg
chr12-insert.jpg

Mariam Okhovat

unread,
Aug 22, 2017, 3:47:22 PM8/22/17
to MELT Help
I am not sure if the deletion snapshot was successfully sent, so here it is again...
chr12-del1.jpg

egar...@umaryland.edu

unread,
Aug 23, 2017, 8:51:03 AM8/23/17
to melt...@googlegroups.com
Hi Miriam,

Thank you for the screenshots! Definitely helps explain your situation. So for the nonREF site, I am unsure as to why MELT called it a het and it is a bit surprising considering the overwhelming evidence for hom ALT. I don't have any particularly good explanation other than that there is genotyping error, especially when deciding homALT vs het.

As for the deletion, MELT should call that just fine. My guess is there is something weired going on with the bed input used to genotype. Could you share the bed file you are using and give me the expected coordinate for the deletion in your screenshot?

Best,

Eugene

Mariam Okhovat

unread,
Aug 23, 2017, 12:14:53 PM8/23/17
to MELT Help
Hi Eugene,

Thanks for the email. Sure, but would there be a way for me to send you the bedfile privately? 

Mariam Okhovat

unread,
Aug 23, 2017, 5:29:59 PM8/23/17
to MELT Help
Also, here are the codes I used to run the deletion pipeline:

java -Xmx2g -jar $DIR/MELT.jar Deletion-Genotype -l alignment.sorted.bam -w $DIR1 -b $DIR/predicted-TE.bed -h $DIR/genome.fasta -e 250


ls $DIR1/*.del.tsv > $DIR1/melt.list.of.sim.del.outputs.txt


java -Xmx1G -jar $DIR/MELT.jar Deletion-Merge -b $DIR/predicted-TE.bed -h $DIR/genome.fasta -l $DIR1/melt.list.of.sim.del.outputs.txt -o $DIR1/sim.final_del_comp.vcf -d 100000

Reply all
Reply to author
Forward
0 new messages