Dear Kostas,
Thank you for using the UCSC Genome Browser and your question about intact LINE elements in the mm10 assembly and your example link.
If you look closely at this example spot you fill find it is actually a LINE element that did not perfectly align in this region. Click this below session link that will have a second "Detailed Visualization of RepeatMasker Annotations" track displayed and three highlights, one for a light blue and a second for a darker blue, and a slight yellow highlight to emphasize where this LINE element is joined:
http://genome-euro.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=mm10_MLQ20219
Click into the top "L1Md_T#LINE/L1" element to arrive at at details page for the rmskJoinedBaseline item that shares this element is fragmented, showing it is broken into two regions and the alignments displayed below. Scroll down to find a graphic on the description section explaining the various graphical items in the "Detailed Visualization of RepeatMasker Annotations" track. Click the "View table schema" to learn about the rmskJoinedBaseline table.
If you click into the below "Repeating Elements by RepeatMasker" rmsk track in the two different blue regions, you will see that the two combine to have 5669 and 996 sizes for about 6.6Kb in size, but are two different annotations. Click the "View table schema" to learn about the rmsk table.
You could do MySQL queries to extract the coordinates of regions that are above a certain size. With MySQL installed on your computer (
http://genome.ucsc.edu/goldenPath/help/mysql.html) you could use the command like the following to get 100 examples where the rmsk table has entries greater than 6,000 bp (remove limit 100 to get all):
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -NAe 'select repClass, repName, genoName, genoStart, genoEnd, (genoEnd-genoStart) as diff from rmsk where (genoEnd-genoStart) >= 6000 and repClass like "%LINE%" limit 100;' mm10
This would screen out such above items that are split into two sections and give results such as the following (where the final number is the span):
LINE L1_Mus4 chr1 23500015 23506071 6056
If you felt you were interested in also selecting for the first non-intact items that might span large regions, another option would be to look at the rmskJoinedBaseline table with a query like the following:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -NAe 'select name, chrom, alignStart, alignEnd, (alignEnd-alignStart) as diff from rmskJoinedBaseline where (alignEnd-alignStart) >= 6000 and name like "%LINE%" limit 100;' mm10
This would capture the item in the top session, but would also capture many other LINE items that span large regions (where the final number is the span, I have a feeling this would not suit your needs):.
L1Md_T#LINE/L1 | chr15 | 26280242 | 26286907 | 6665
..
L1Md_F2#LINE/L1 | chr1 | 5920137 | 5939499 | 19362
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
All the best,
Brian Lee