abacus geneID protein matching and NUMXML function

20 views
Skip to first unread message

rsturm

unread,
Sep 20, 2011, 9:42:41 PM9/20/11
to Abacus Support
Hello again Damian,

I noticed a couple "bugs" when I ran a portion of my data through
Abacus. Using the new version of uniprotmapper i was able to get a
textfile for the genemap file. When I ran Abacus using the Default
output type and then viewed the resulting tsv file in excel I noticed
that I was getting "null" in the GeneID column for proteins. I think
this is because the fasta file header in your sample data file set is
different than the header in my fasta file and Abacus may get
"confused" in matching the Swissprot ID to the geneID with my header
format.

Abacus sample fasta file header:
>P31946 ID=#1433B_HUMAN# GeneName=#YWHAB# Def=#14-3-3 protein beta/alpha, N-terminally processed#
MTMDKSELVQKAKLAEQAERYetc

My fasta's file header from Swissprot download:
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYetc

If this is the issue, are you able to add my faster header type to the
coding for abacus so I can get the geneIDs in my output, or is this
not the problem?

Secondly, in my abacus run I have 4 prot.xml files that make up the
combined prot.xml file (5 total prot.xml files total). It was my
understanding from the sample data's output that the NUMXML column
should only take in account the 4 prot.xml files that make up the
combined prot.xml file. Therefore 4 should be the greatest value an
any cell in this column for my data set. I am getting 5 as the max
value suggesting that the combined prot.xml is also being counted.
This isn't a big issue since I can just subtract 1 from the NUMXML
cell, but I thought that you should know about it.

Thanks for the help,

Rob



GATTACA

unread,
Sep 21, 2011, 11:29:51 AM9/21/11
to Abacus Support
Hi Rob.

Thanks for spotting that numXML bug I'll fix that shortly.

As for your protein fasta header problem, could you email me one of
your protXML files? Not the combined file, just one of the individual
ones.
This problem happens all the time since I can't anticipate what
people's fasta files will look like. With an example protXML file I
can fix the code to work with your particular data.

Damian

Rob Sturm

unread,
Sep 21, 2011, 11:45:40 AM9/21/11
to abacus-...@googlegroups.com
Damian,

Attached is an individual prot.xml file from my dataset.

thanks!

Rob

interact.1A4V2400.prot.xml

GATTACA

unread,
Sep 21, 2011, 3:24:10 PM9/21/11
to Abacus Support
Hi.

Abacus has been updated to handle the default swissprot/uniprot fasta
headers.
I've also fixed the numXML bug.

Try out the latest version and let me know if it works for you now.

Cheers,
Damian
>  interact.1A4V2400.prot.xml
> 3662KViewDownload
Reply all
Reply to author
Forward
0 new messages