Metadata proposal for use with MeV

1 view
Skip to first unread message

eleano...@gmail.com

unread,
Aug 31, 2009, 2:54:37 PM8/31/09
to gaggle-discuss
Hi guys,

I've been talking to Burak, Lee and Dan about adding some metadata to
Gaggle broadcasts that would let MeV better interpret the information
it's getting. There are two immediate use cases for this.

1) It would be helpful if MeV could tell what type of identifiers it's
getting when it receives a namelist broadcast. Then it would know
which of the annotation types it should try to match that namelist
against.

2) When MeV accepts a matrix broadcast it currently assumes that the
data inside it is a single-color (Affymetrix-like) array. A metadata
item indicating whether the data was single color or not would let MeV
scale the color range on its heatmaps correctly. Also, it would be
nice to allow a broadcast to indicate if the data has been logged or
not.

I'm sure it would also be helpful to others if MeV's outbound
broadcasts included some of this information as well.

I'm thinking of just going ahead and defining some key-value pairs
that MeV will look for in the metadata when receiving a broadcast. If
anyone else is already using something similar, I'd be interested in
adopting it.

How do these sound:

key : value1 | value2 | value3

For a matrix broadcast:
"array-type" : "single-color" | "two-color"
"log-status" : "unlogged" | "log2" | "log10"
"platform-name" : "affy-hg-u133a" | "affy-hg-u133a_2_plus" | etc

For a namelist broadcast
"identifier-type" : "ENTREZ" | "GENBANK_ACC" | etc (these are taken
from MeV's standard identifier type names)

I'm throwing these out there because they're convenient for me and
make sense relative to the type of data MeV deals with. They may not
be very good general-purpose items, but is that a big deal?

Eleanor

Dan Tenenbaum

unread,
Aug 31, 2009, 3:17:17 PM8/31/09
to gaggle-...@googlegroups.com
Hi Eleanor,

This sounds good to me.
When we added the option to send metadata along with Gaggle broadcasts, we figured that each goose (or pair of geese) would use the metadata in a way that makes sense for them.

Trying to define a big huge standard that could encompass all possible kinds of metadata one might encounter in the systemsbiology world seems like a futile effort, and one that would undermine the simplicity that Gaggle strives for. Therefore I think doing things because they are convenient for you, and make sense relative to MeV, is a great idea: the motivation should be to solve your immediate problems.

I would suggest that MeV be able to handle it if there is no metadata accompanying the broadcast, and either use sensible defaults (user-configurable?)  or prompt the user for the required information.

Since key-value pairs can be nested, I might also suggest that all these key-value pairs appear in a tuple called "MeV-metadata" or some such. This would accomplish two goals:
  • The MeV goose would simply have to look for that one tuple and if it finds it, it knows it can expect metadata in the format it wants
  • You would avoid potential namespace conflicts. If in the future, some other goose wants to have something called "identifier-type" or "array-type" but means something different by it, you would make sure you don't read that data by mistake. (Of course that other goose should do some sort of namespacing as well).
On the other hand, identifier-type is something that would be potentially useful to many other geese besides MeV. It's just a question of coming up with a list of identifier types that we can all agree on. 
So perhaps things that are likely to be specific to MeV can live in their own MeV-specific tuple, stuff that could be more generally useful can be at the top level.

Let us know how we can help by adding support for this to various geese that might interact with MeV.

Thanks
Dan

Lee Pang

unread,
Aug 31, 2009, 7:03:57 PM8/31/09
to gaggle-...@googlegroups.com
Hi Eleanor and Dan,

I think putting Goose specific properties in the metadata the way you've described sounds reasonable.  Let me know which will be supported so I can add the appropriate methods to the MatGoose.

Lee

Eleanor Howe

unread,
Sep 1, 2009, 11:39:25 AM9/1/09
to gaggle-...@googlegroups.com
Hi Dan,

I really like the namespacing idea. I'll plan on that. I'll also plan on using the identifier type as a 'general' item in the metadata, and nest the others under an MeV-metadata tag. If no metadata is present, MeV will use the same defaults it uses now.

Hi Lee,

I'll plan to have this ready for our v4.5 release in November. Hopefully I'll get you a test copy well before that so you can try it out.

Eleanor

mark gibson

unread,
Sep 1, 2009, 12:13:34 PM9/1/09
to gaggle-...@googlegroups.com
Im sure I dont need to say as much, but the general and specific geese
metadata should all be well documented in an obvious place - gaggle
website - for other developers to easily figure out, take advantadge of
and reuse.

Mark


Eleanor Howe wrote:
> Hi Dan,
>
> I really like the namespacing idea. I'll plan on that. I'll also plan
> on using the identifier type as a 'general' item in the metadata, and
> nest the others under an MeV-metadata tag. If no metadata is present,
> MeV will use the same defaults it uses now.
>
> Hi Lee,
>
> I'll plan to have this ready for our v4.5 release in November.
> Hopefully I'll get you a test copy well before that so you can try it
> out.
>
> Eleanor
>
>
>
> On Mon, Aug 31, 2009 at 7:03 PM, Lee Pang <wlee...@gmail.com
> <mailto:wlee...@gmail.com>> wrote:
>
> Hi Eleanor and Dan,
>
> I think putting Goose specific properties in the metadata the way
> you've described sounds reasonable. Let me know which will be
> supported so I can add the appropriate methods to the MatGoose.
>
> Lee
>
>
> On Mon, Aug 31, 2009 at 12:17 PM, Dan Tenenbaum
> <dtene...@systemsbiology.org
> <mailto:dtene...@systemsbiology.org>> wrote:
>
> Hi Eleanor,
>
> This sounds good to me.
> When we added the option to send metadata along with Gaggle
> broadcasts, we figured that each goose (or pair of geese)
> would use the metadata in a way that makes sense for them.
>
> Trying to define a big huge standard that could encompass all
> possible kinds of metadata one might encounter in the
> systemsbiology world seems like a futile effort, and one that
> would undermine the simplicity that Gaggle strives for.
> Therefore I think doing things because they are convenient for
> you, and make sense relative to MeV, is a great idea: the
> motivation should be to solve your immediate problems.
>
> I would suggest that MeV be able to handle it if there is no
> metadata accompanying the broadcast, and either use sensible
> defaults (user-configurable?) or prompt the user for the
> required information.
>
> Since key-value pairs can be nested, I might also suggest that
> all these key-value pairs appear in a tuple called
> "MeV-metadata" or some such. This would accomplish two goals:
>
> * The MeV goose would simply have to look for that one
> tuple and if it finds it, it knows it can expect
> metadata in the format it wants
> * You would avoid potential namespace conflicts. If in the
> future, some other goose wants to have something called
> "identifier-type" or "array-type" but means something
> different by it, you would make sure you don't read that
> data by mistake. (Of course that other goose should do
> some sort of namespacing as well).
>
> On the other hand, identifier-type is something that would be
> potentially useful to many other geese besides MeV. It's just
> a question of coming up with a list of identifier types that
> we can all agree on.
> So perhaps things that are likely to be specific to MeV can
> live in their own MeV-specific tuple, stuff that could be more
> generally useful can be at the top level.
>
> Let us know how we can help by adding support for this to
> various geese that might interact with MeV.
>
> Thanks
> Dan
>
>
>
> On Mon, Aug 31, 2009 at 11:54 AM, eleano...@gmail.com
> <mailto:eleano...@gmail.com> <eleano...@gmail.com
Reply all
Reply to author
Forward
0 new messages