Input data

6 views
Skip to first unread message

maggi

unread,
Jul 7, 2009, 4:48:49 AM7/7/09
to GenMAPP
Hello,
I am new to GenMAPP MAPPFinder. I need some help understanding the
process of input data import into GenMAPP.

I have installed the GenMAPP 2. I am using Gene Databases Hs-
Std_20070817.gdb
My expression data set contains four columns.First column is Agilent
probe Ids and second column is system code as Ag,Third colums contains
p value. Fourth column as regulation (Up/Down) for criteria selection.
After reading the recommendation for the Input data I understand that
I have to upload entire probe information on the array (background
information or measured gene information). For instance Iam using
agilent human whole genome array consisting of 43373 probes
representing some of the genes multiple times.
After loading the expression data I get exception of 11038 ids saying
gene not found in agilent or related system. But I checked some of the
genes from the exception file to see if the genes are present in the
expression dataset. I found that most of the genes which are marked as
exceptions are present in the expression data set. Does this means
that exception ids are actually multiple gene entries in the dataset?
After running the GenMAPP and MAPFinder I get the results with the
following summary:-
My calculation summary in the MAPPFinder result looks like this:
Calculation Summary:
320 probes met the [Regulation] = "Down" criteria.
287 probes meeting the filter linked to a Ensembl ID.
211 genes meeting the criterion linked to a GO term.
43373 Probes in this dataset
31648 Probes linked to a Ensembl ID.
14993 Genes linked to a GO term.
The z score is based on an N of 14993 and a R of 211 distinct genes in
the GO.

Am I doing something wrong?
Do I have to eliminate the multiple gene entries before importing the
expression data to GenMAPP?
Is this a large dataset to analyze?
Any help would be welcome.
Thank you for your help,
Maggi.

Kristina Hanspers

unread,
Jul 7, 2009, 2:11:10 PM7/7/09
to GenMAPP
Hello Maggi,

The GenMAPP databases uses information from Ensembl for Agilent links,
so if a particular Agilent ID is not mapped according to Ensembl the
ID will not import with the GenMAPP database. This is most likely the
case for the subset of IDs in your data that don't import.

You say that you checked the expression dataset for the exceptions.
How did you check the IDs? Duplicated entries in the input file will
not be exceptions, they will all be imported as long as there is a
match in the database.

The MAPPFinder results you got seem reasonable. If you are not getting
as many hits as you expect, try setting a slightly less stringent
criteria for the MAPPFinder analysis.

It is not a problem that your dataset is quite large, in fact this is
often better to have a large dataset than a dataset that is too small.

Regards,

Kristina
Reply all
Reply to author
Forward
0 new messages