Gold Standard naming &.searchability issues

jochen rink

unread,

Aug 2, 2013, 3:05:52 AM8/2/13

to plan...@googlegroups.com

Hi HingKee,

Thanks for importing the GoldStandard. I'd suggest the following changes:

1) we should change the naming convention: gs_Smed_v1 looks too much like a transcriptome (in fact, I was wondering for a while which city "gs" could refer to... ;)). I'd suggest

"Smed_GoldStandard_v1"

2) GoldStandrad transcript names, e.g. "gs_Smed_v1_EU296629"- I think we should drop all prefixes and refer to the transcript by its real name, i.e. EU296629, to make clear that this is a sequence imported from a public archive.

3) Tricky one: The gene names of gold standard transcripts are not searchable in "Search". E.g., EU296629 (Smed-beta-Catenin-1) should ideally be searchable under "catenin" or "beta" and the like. Question: Is it possible to import the names of the genbank records? This would be really useful, since users are always going to search for their favourite genes first (i.e., the ones that have already been published).

As always, muchos gracias!

J

HongKee Moon

unread,

Aug 2, 2013, 3:35:23 AM8/2/13

to plan...@googlegroups.com

Hello, Jochen,

Thank you for the feedback.

My answers are below.

Cheers,

HongKee

On Friday, August 2, 2013 9:05:52 AM UTC+2, jochen rink wrote:

Hi HingKee,
Thanks for importing the GoldStandard. I'd suggest the following changes:

1) we should change the naming convention: gs_Smed_v1 looks too much like a transcriptome (in fact, I was wondering for a while which city "gs" could refer to... ;)). I'd suggest
"Smed_GoldStandard_v1"

It could be doable. I need some additional jobs to modify of the integration of jbrowse. Actually, I used first 6-character string as a jbrowse database identifier. It means we're using Smed_Go in the context of the jbrowse world.

2) GoldStandrad transcript names, e.g. "gs_Smed_v1_EU296629"- I think we should drop all prefixes and refer to the transcript by its real name, i.e. EU296629, to make clear that this is a sequence imported from a public archive.

It's done in planmine(external).

3) Tricky one: The gene names of gold standard transcripts are not searchable in "Search". E.g., EU296629 (Smed-beta-Catenin-1) should ideally be searchable under "catenin" or "beta" and the like. Question: Is it possible to import the names of the genbank records? This would be really useful, since users are always going to search for their favourite genes first (i.e., the ones that have already been published).

I tested it. In your case, you should use "beta*" or "*beta*" (please put "*"-Asterisk) for searching all contigs containing "beta".

HongKee Moon

unread,

Aug 2, 2013, 3:46:18 AM8/2/13

to plan...@googlegroups.com

Hi, Jochen,

I'd like to make the thing clear.

Do you want to use GenBank's name instead of the names in the Shang-Yun's data?

For example, regarding EU296629, do you want to have "Schmidtea mediterranea beta-catenin-1 mRNA, complete cds" instead of "beta-catenin-1"?

Don't worry, we can import GenBank names into our planmine with python script as you wish.

Cheers,

HongKee

On Friday, August 2, 2013 9:05:52 AM UTC+2, jochen rink wrote:

jochen rink

unread,

Aug 2, 2013, 5:12:37 AM8/2/13

to plan...@googlegroups.com

Hi guys,

Good points!

I'd suggest we proceed as follows:

1) we use Ian's suggestion of GoldStandard_Smed_v1 as general identifier of the Gold Standard data set.

2) Thanks, the *xxxx* search hint solved quite some issues. I suggest:

a) to use as default search example "e.g. wnt; *catenin*; dd_Smed*" to emphasize the use of search operators.

b) when searching *catenin*, the "details" entry of the search form only contains EU296629 and length. Would be useful to also display name; ID (e.g., "beta-catenin-1; GoldStandard_Smed_v1EU296629".

Just forget b) in case this change would again affect the entire database structure.

Reply all

Reply to author

Forward