Data input

Rebecca Baillie

unread,

Jun 4, 2012, 7:22:17 PM6/4/12

to UCSF EGAN

Dear Egan

I am a new user and can't seem to get my datasets to load correctly.
When I load the data, I can see the experiments under the experiments
tab, but the columns in the Entrez Gene window have only null or NAN
in them. In addition, I can not resize the columns. Is there a
specific format for the text file that I should be using. Right now I
am converting an excel file to txt.

Rebecca

ucsf egan

unread,

Jun 4, 2012, 7:53:45 PM6/4/12

to ucsf...@googlegroups.com

Hi Rebecca,

Do you have the data in the three-column tab-separated text format?

1) Column one contains gene identifiers (probe IDs, symbols, etc.). Only one type of ID per file (i.e. a mixture of gene symbols and probe IDs will cause problems)

2) Column two contains a statistic for each gene.

3) Column three contains a p-value for each gene. This one is optional.

A header row is required as the first line in the file - the name of each column can be whatever you like (although I wouldn't recommend strange characters or symbols).

You will also want to select the proper background set of genes for your analysis in section 5) of the Launch EGAN Wizard. If your data come from an array, the first few options (intersect, union, first, last) are probably most appropriate.

Please post if you have more questions.

Best,

Jesse

Rebecca

--
You received this message because you are subscribed to the Google Groups "UCSF EGAN" group.
To post to this group, send email to ucsf...@googlegroups.com.
To unsubscribe from this group, send email to ucsf-egan+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ucsf-egan?hl=en.

Rebecca Baillie

unread,

Jun 4, 2012, 10:16:27 PM6/4/12

to UCSF EGAN

Hi Jesse

I have put the data into the three column format. I have used set all
for the background since if I use anything else I get an error

java.lang.Exception: No Entrez Gene nodes in network. Check to see
that identifier mapping files are correct.
at
edu.ucsf.cc.icore.app.egan.launch.DataControllerFactory.constructDataController(DataControllerFactory.java:
151)
at
edu.ucsf.cc.icore.app.egan.launch.LaunchEganThread.run(LaunchEganThread.java:
119)
at java.lang.Thread.run(Unknown Source)

If I use set all, then the experiment seems to load into the software
and I do get the appropriate statistic, sign, and pvalue columns in
Entrez gene table. I do not get any numbers or directionalities in the
table. The columns read NAN, Null, NAN for all genes. Since it lists
58000 genes when I only uploaded 1000, I assume that it is not reading
the genes I am uploading. I am using an Illumina gene set. Is there
any way around this problem?

Rebecca

On Jun 4, 4:53 pm, ucsf egan <ucsf.e...@gmail.com> wrote:
> Hi Rebecca,
>
> Do you have the data in the three-column tab-separated text format?
>
> 1) Column one contains gene identifiers (probe IDs, symbols, etc.). Only
> one type of ID per file (i.e. a mixture of gene symbols and probe IDs will
> cause problems)
> 2) Column two contains a statistic for each gene.
> 3) Column three contains a p-value for each gene. This one is optional.
>
> A header row is required as the first line in the file - the name of each
> column can be whatever you like (although I wouldn't recommend strange
> characters or symbols).
>
> You will also want to select the proper background set of genes for your
> analysis in section 5) of the Launch EGAN Wizard. If your data come from
> an array, the first few options (intersect, union, first, last) are
> probably most appropriate.
>
> Please post if you have more questions.
>
> Best,
>
> Jesse
>

ucsf egan

unread,

Jun 4, 2012, 11:09:11 PM6/4/12

to ucsf...@googlegroups.com

Hi Rebecca,

Thanks for posting the error message - that's what is supposed to happen when no genes map from the experiment file. You are correct - it seems most likely that the identifiers in your experiment file are not matching up with the mapping file. What mapping file do you select in the Launch EGAN Wizard? You can also use that file for the background option (recommended).

Can you provide an example ID from your experiment file?

Best,

Jesse

Rebecca Baillie

unread,

Jun 4, 2012, 11:22:19 PM6/4/12

to ucsf...@googlegroups.com

Hi Jesse,

I am using an Illumina mouse microarray. I have both target ids and probe ids. I get the same result using either one.

Probe ids look like

ProbeID	logFC	P.Value
870465	-0.3537	1.21E-07
3190307	0.393937	2.85E-07
1940731	0.470578	5.01E-07
3120133	-0.35577	7.20E-07
2070673	0.526229	9.98E-07
1690678	0.594751	1.16E-06

Target ids look like

targetID	logFC	P.Value
0610005C13RIK	-0.3537	1.21E-07
0610005I04	0.393937	2.85E-07
0610006I08RIK	0.470578	5.01E-07
0610006I08RIK	-0.35577	7.20E-07
0610006L08RIK	0.526229	9.98E-07

I have been using the Illumina Mouse WG v1 (or2) for the mapping file. I have tried both. I have used the same file for the background.

Rebecca

ucsf egan

unread,

Jun 5, 2012, 11:55:48 AM6/5/12

to ucsf...@googlegroups.com

Hi Rebecca,

Good troubleshooting. The Illumina mapping files contain identifiers that look like this:

A_51_P357980

If you can't find those, I suggest gene symbols. But if you use gene symbols, make sure to use the Illumina mapping file for the background definition.

Hope that works!

Jesse

Rebecca Baillie

unread,

Jun 5, 2012, 5:33:33 PM6/5/12

to ucsf...@googlegroups.com

Hi Jesse,

It seems that Illumina changes naming conventions periodically. However, Entrez GeneIDs do work. Thank you very much.

ucsf egan

unread,

Jun 5, 2012, 5:40:14 PM6/5/12

to ucsf...@googlegroups.com

You're welcome. Also I forgot to mention that EGAN readily accepts custom mapping files for ID types that aren't currently mappable.

Reply all

Reply to author

Forward