.gmp file explanation when looking for A --> I editing events

38 views
Skip to first unread message

Varun Gupta

unread,
Jun 17, 2014, 3:13:50 PM6/17/14
to gnumap...@googlegroups.com
HI,

I am using -d OPTION IN gnumap SOFTWARE which looks for A-->I RNA editing events. A .gmp is created specifically for A--> I events after running GNUMAP. Can some one explain me what each column means??

Here's my sample .gmp file

chrM    82    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    84    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    87    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    93    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    95    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    102    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    227    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    230    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    232    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000
chrM    234    1.000000    1.00000    0.00000    0.00000    0.00000    0.00000


Hope to hear from you soon!!!!!!!!

Regards
Varun

Nathan Clement

unread,
Jun 17, 2014, 3:55:40 PM6/17/14
to gnumap...@googlegroups.com
Sure. The columns are as follows

<CHR_NAME> <CHR_POS> <TOTAL_AMT> <AMT_A> <AMT_C> <AMT_G> <AMT_T> <AMT_N>

 - CHR_NAME is the name of the chromosome, according to the input file
 - CHR_POS is the position on that chromosome. When you specify the -d option, GNUMAP only prints out locations that have an 'a' in the genome (which is why you don't see every position in the .gmp file). 
 - TOTAL_AMT is the total number of (possibly partial) reads aligned to that location, ie the sum over all a, c, g, t, and n.
 - TOTAL_{A,C,G,T,N} is the amount of the corresponding nucleotide at that location.

So for the snippet you sent, it looks like there was exactly one read that aligned to the location, and all 'a' characters in the genome were also 'a' in the read.

Nathan



--

---
You received this message because you are subscribed to the Google Groups "gnumap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gnumap-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

VG

unread,
Jun 17, 2014, 4:46:09 PM6/17/14
to gnumap...@googlegroups.com
Hi Nathan,
Thanks for the reply. The file now makes sense. Still exploring.
Also can you explain this part


TOTAL_AMT is the total number of (possibly partial) reads aligned to that location, ie the sum over all a, c, g, t, and n.

What do you mean by possibly partial???

Also Nathan I posted a question before but still haven't heard back. Can you tell the solution for it.



My fastq file looks like this


@HWI-ST395_BD0W43ACXX_1:6:
1101:1489:2078#ATCACG/1
AAGCTGCCAGTTGAAGAACTGT
+HWI-ST395_BD0W43ACXX_1:6:1101:1489:2078#ATCACG/1
CCCFFFFFHHHHHJJJJJJJJJ
@HWI-ST395_BD0W43ACXX_1:6:1101:1355:2082#ATCACG/1
CATAAAGTAGAAAGCACTACT
+HWI-ST395_BD0W43ACXX_1:6:1101:1355:2082#ATCACG/1
CCCFFFFDFHHHFEGHFHHIH
@HWI-ST395_BD0W43ACXX_1:6:1101:1314:2092#ATCACG/1
TAGCAGCACGTAAATATTGGCG
+HWI-ST395_BD0W43ACXX_1:6:1101:1314:2092#ATCACG/1
CCCFFFFFHHHHHIIHIJJIHJ

I only want to allow 2 mismatches when aligning to hg19 genome, but I cannot find the option to do so. Could you point me out which option does that. How many mismatches does gnumap allows by default??

Hope to hear from you soon.

Regards
Varun



--

---
You received this message because you are subscribed to a topic in the Google Groups "gnumap-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gnumap-users/kJCyCARRSas/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gnumap-users...@googlegroups.com.

Nathan Clement

unread,
Jun 17, 2014, 6:22:37 PM6/17/14
to gnumap...@googlegroups.com
Certainly. I'll try and answer your questions below.


On Tue, Jun 17, 2014 at 1:46 PM, VG <gupta5...@gmail.com> wrote:
Hi Nathan,
Thanks for the reply. The file now makes sense. Still exploring.
Also can you explain this part


TOTAL_AMT is the total number of (possibly partial) reads aligned to that location, ie the sum over all a, c, g, t, and n.

What do you mean by possibly partial???

GNUMAP aligns reads to locations in the genome in a probabilistic fashion. So if a read aligns to two locations with an equal score, it will give a score of 0.5 to each location. If it aligns to one of the two locations better, one might have a score of 0.75 and the other would have a score of 0.25. This is why it is possibly partial.
 

Also Nathan I posted a question before but still haven't heard back. Can you tell the solution for it.



My fastq file looks like this


@HWI-ST395_BD0W43ACXX_1:6:
1101:1489:2078#ATCACG/1
AAGCTGCCAGTTGAAGAACTGT
+HWI-ST395_BD0W43ACXX_1:6:1101:1489:2078#ATCACG/1
CCCFFFFFHHHHHJJJJJJJJJ
@HWI-ST395_BD0W43ACXX_1:6:1101:1355:2082#ATCACG/1
CATAAAGTAGAAAGCACTACT
+HWI-ST395_BD0W43ACXX_1:6:1101:1355:2082#ATCACG/1
CCCFFFFDFHHHFEGHFHHIH
@HWI-ST395_BD0W43ACXX_1:6:1101:1314:2092#ATCACG/1
TAGCAGCACGTAAATATTGGCG
+HWI-ST395_BD0W43ACXX_1:6:1101:1314:2092#ATCACG/1
CCCFFFFFHHHHHIIHIJJIHJ

I only want to allow 2 mismatches when aligning to hg19 genome, but I cannot find the option to do so. Could you point me out which option does that. How many mismatches does gnumap allows by default??

GNUMAP's alignment is not governed by number of mismatches, but my alignment score. The -a option on the command line will tell you (by default) the alignment percentage. So -a 0.9 (which, I believe, is the default) will require an alignment of 90% of the total possible score. You can also do the -r flag to change the score from a percentage to a raw score, but the percentage alignment score takes into account the fact that lower fastq qualities would make for a lower possible score.

If your reads are 100nt long and you only want 2 mm, try -a .98 (98% of the highest possible alignment score).
Reply all
Reply to author
Forward
0 new messages