Sarah Killcoyne
unread,Sep 23, 2008, 2:19:54 PM9/23/08Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to systemsbiology-visualizations
Subgroup: Sarah, Victor, Dan
Notes: Dan
Basically we decided that three Google data tables should be used to
describe node attributes.
None of the table or column names that follow are set in stone:
Table 1: node_attrs
id | namespace | name | value
Table 2: edge_attrs
(same structure as node_attrs)
Table 3: namespaces_lookup
shortcut | uri
Hopefully most of this is self-explanatory. We are assuming each node
and edge has a unique ID (how that ID is generated and guaranteed to
be unique is something we will leave for the network subgroup to work
out ;).
We thought it best to have nodes and edges in separate tables, since
most operations will affect only nodes or edges, not both, so you
don't want to have to be filtering all the time. Also, this allows you
to have node attributes and edge attributes with the same name, but
different purposes.
Attributes should be namespaced so that (for example) "KEGG.foo" is
different from "GO.foo".
Namespaces should be handled as in XML, where a namespace is defined
by a URI, but that URI can be referred to by a shortcut within a given
dataset (to avoid having a potentially long url in each row). So Table
3 is simply a lookup table mapping a shortcut to a URI and vice versa.
Misc notes:
The visualizer should have its own (reserved) namespace, so that
attributes that are directly concerned with rendering the network can
be stored and not stepped on by other attributes.
This structure allows you to have a node (or edge) where the same
attribute can occur multiple times with different values. This may not
be what is desired in most cases (for example, if the visualizer is a
applying a visual style, it should only have one style name to choose
from) but we saw no reason to pre-emptively forbid this. The code that
populates these tables can use a hashtable internally to detect
duplicates.
In the interest of simplicity, we are currently avoiding the issue of
data types for attribute values. There will be cases where visualizer
or plugin code will expect attribute values to be of a certain type
(String, int, float); for example, the value for node size should be
parsable as a number.
We don't want to do what Cytoscape does and declare a separate column
for each type, e.g.: int_value, float_value, string_value.
The "value' column will be assigned the type 'string' so that anything
can fit in it.
We discussed a couple of different ways to handle this type issue
(should it arise), none of which would affect the table structure
described above. So we decided that this issue was not important now,
and a solution can be added later if necessary.
It would be nice to be able to set metadata on a google data table,
such as a table name. It doesn't appear that you can do that.
Therefore it is assumed that there will be some sort of overall object
that will encapsulate all the tables used to describe a network (aka a
dataset), and that it will know how to stitch everything together. So,
inside that object (and not in any particular data table) will be a
name (perhaps a URI) that will bind the attribute tables to a dataset.