Getting final MEI sequences

Guillermo Peris Ripollés

unread,

May 18, 2018, 4:21:57 AM5/18/18

to MELT Help

Hi Eugene,

I am trying to get the insertion sequences found by MELT in order to determine family subtype (for L1Ambig or SVA insertions, for example). However, I cannot find them anywhere in the output folder.

I can write a script to get these sequences using consensus MEI and DIFF field. However, I wonder if there is any easier way to get insertion MEIs from MELT output.

Thanks!

egar...@umaryland.edu

unread,

May 18, 2018, 4:25:30 AM5/18/18

to MELT Help

Hello Guillermo,

Unfortunately, the only way to currently get the consensus sequences is to write a script to reconstruct them based on the MEI and DIFF field (and is in fact how I did it for the MELT paper).

Just as a reminder, MELT can only reconstruct the first and last N bases of an insert, where N is the insert size of the library prep (i.e. if the insert size is 500 bp, MELT will only be able to assemble approximately 500 bases on the 5' and 3' end of the insert). So please limit your expectations accordingly.

Best,

Eugene

Guillermo Peris Ripollés

unread,

Sep 20, 2018, 7:20:51 AM9/20/18

to MELT Help

Hi Eugene,

Related to the original question in the thread, now I am interested in getting the reference IDs of the split reads supporting a putative insertion call to visually check its features. Is there any place/file where I can get them, or are they not supplied by MELT?

Thank you in advance,

Guille.

egar...@umaryland.edu

unread,

Sep 21, 2018, 5:23:04 AM9/21/18

to MELT Help

Hello Guille,

The evidence is stored in a file like:

<MEI>.hum_breaks.sorted.bam

in the working directory provided with the -w option.

This is a bam file with evidence merged across all samples if running with MELT-split. You can extract all reads within ±1kbp of the site listed in the final VCF. Attached to reads which are discordant are tags which are the information for the pair aligned to the MEI of interest.

"OS" = Alignment Start

"OE" = Alignment End

"OC" = Cigar String

"Q2" = Quality String

"R2" = Sequence String

This information can be extracted and then aligned to the MEI reference. For split reads there is no such information as that is already stored in the primary read string and can be extracted with the reads cigar.

Not all reads included in this file are utilized by MELT to decide evidence. There are some filters (Duplicate Reads, MAPQ 0, etc) but by and large these reads should represent evidence used to determine features.

Let me know if I can provide any more help.

Best,

Eugene

Guillermo Peris Ripollés

unread,

Sep 24, 2018, 5:14:29 AM9/24/18

to MELT Help

Thank you for your fast response, Eugene.

I could extract DR and SR without problem following your instructions. I am interested in particular in split reads. Considering your information, can I take as SRs those reads without, let's say, OS or OE tag?

Best,

Guille.

egar...@umaryland.edu

unread,

Sep 24, 2018, 5:34:59 AM9/24/18

to MELT Help

Yes, you can take split reads reads without the MELT-specific tags and extract the split sequence. Just realize, again, that MELT does some internal checks to make sure that the SRs are actually relevant to MEI discovery (e.g. alignment to the MEI, check for polyA tail, overall and split sequence quality). So you will need to make sure you do the same.

Best,

Eugene

Guillermo Peris Ripollés

unread,

Sep 24, 2018, 5:38:23 AM9/24/18

to MELT Help

Sure! In fact, we need those reads for a visual inspection.

Thanks!

yizha...@gmail.com

unread,

Dec 20, 2018, 8:56:40 PM12/20/18

to MELT Help

Hi Guillermo,

I want to know how do you get these sequences from the MELT finally VCF ? Can I consult about your script ? Thanks a lot.

Zhangyi

Guillermo Peris Ripollés

unread,

Dec 21, 2018, 3:57:32 AM12/21/18

to MELT Help

Hi Zhangyi,

If you want to get specific reads from and insertion call in MELT, just extract $chr and $pos position and then from the corresponding bam $file (<MEI>.hum_breaks.sorted.bam) use samtools view.

chr=$1
pos=$2
file=$3
mystart=$(( pos - 1000 ))
myend=$(( pos + 1000 ))

samtools view $file "${chr}:${mystart}-${myend}"

If you are interested only in split reads, try this change (following Edgar suggestion)

samtools view $file "${chr}:${mystart}-${myend}" | awk '$0 !~ /OC:/ && $0 !~ /OE:/ {printf sample"/%s ", $1} '

Best,

Guille.

yizha...@gmail.com

unread,

Dec 21, 2018, 4:35:47 AM12/21/18

to MELT Help

Thank you Guille,

Thank you so much, I want to get the whole length insertion sequence, I think your method could get the sequence.

I will try your method.

Best

Zhangyi

alex.m...@gmail.com

unread,

Jun 5, 2019, 9:08:57 AM6/5/19

to MELT Help

Dear Eugene,

For my project, I need to extract the sequence of MEIs from VCF file. Do you think I can have the script that you used for the MELT paper?

Best,

Alex

Eugene Gardner

unread,

Jun 13, 2019, 10:03:14 AM6/13/19

to MELT Help

Hi Alex,

Unfortunately my script is locked up on my old server at my PhD university and I no longer have access to it.