terms to discuss on Tuesday April 28th

0 views
Skip to first unread message

Chris Stoeckert

unread,
Apr 24, 2009, 5:08:03 PM4/24/09
to obi-denr...@googlegroups.com
Dear All,
Here are some suggestions for terms to be discussed at the next DENRIE call. 

term: data transformation parameter specification
definition: a data transformation parameter specification is an information entity about a realizeable that is used in a data transformation to refer to specific kinds of values.
definition source: BP, JM, DENRIE
examples: The integer k in 'k-means clustering', The window size in a 'moving average'; The values for p, T, w, m in a 's transformation' 
restrictions: 
   is_a 'information entity about a realizable'
    is_concretized_as (is_realized by only data transformation)
    is_about some (information_content entity participates_in some data
transformation)
editor note: There are other meanings of parameter such as population characteristic that may still need to be addressed.

term: genome sequence version
definition: genome sequence version is a label that is used to specify the representation of the assembled genome sequence contained in a file or used in an analysis.
definition source: CS, DENRIE
examples: mm8, The March 2006 human reference sequence (NCBI Build 36.1) 
restrictions:
is_about some (genome sequence <output of some data transformation of sequence data into a genome sequence>)
editor note: need to create 'genome sequence'  and/or 'data transformation of sequence data into a genome sequence' or something like it.
original request from Nicole Washington: 
for a data analysis protocol where an entire genomic sequence might be a
specified parameter, it would be useful to be able to specify the genomic
version.

for example, i have an algorithm that takes a genomic sequence as an input,
say like a gene-model prediction algorithm, and outputs some transformation
of the data. the results of the algorithm would be different depending on
which genomic sequence was in the input parameter.

term: tree model 
definition: a tree model is a data representational model in which there are one or more layers of leaf nodes attached in a hierarchical manner and there may be a top or root node. 
definition source: CS, DENRIE
examples: tree models are use in phylogenetic trees, gene clusters based on microarray data
restrictions: ?? 
editor note: not sure how to logically define hierarchical structure which is what distinguishes this from other models.
original request from James:
Tree as a set of linked nodes. (such as here
http://en.wikipedia.org/wiki/Tree_data_structure). Presumably a DENRIE
branch concept.

term: time series collection
definition: a time series collection is a data collection that is a sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals. 
definition source: Wikipedia
examples: gene expression measurements of cells taken from a culture over a series of days.  
restrictions: 
is_output_of some measurement
is_input_to some data transformation
original request from James:
DT requires the concept of 'time series' which would serve as input to some
of the DTs that deal with this. As a starting point for time series, here
is the wikipedia def: "A time series is a sequence of data points, measured
typically at successive times, spaced at (often uniform) time intervals". 

term: heatmap
definition: a heatmap is a report element which is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors.
definition source: Wikipedia
examples: representation of microarray data for expression values of many genes across multiple samples or conditions. 
original request from James:
term: survival curve
definition: a survival curve is a report element which plot percent survival as a function of time.
definition source: Graphpad.com
original request from James:
term: venn diagram
definition: a venn diagram is a report element which is constructed with a collection of simple closed curves drawn in the plane.
definition source: Wikipedia
original request from James:
term: graph diagram
definition: a graph diagram is a report element which is a collection of points and lines connecting some (possibly empty) subset of them. 
original request from James:
-graph (in the sense of V=vertices, E=edges, not in the sense of graph of a function.) See second definition here
http://mathworld.wolfram.com/Graph.html

Chris Stoeckert

unread,
Apr 28, 2009, 10:39:10 AM4/28/09
to obi-denr...@googlegroups.com

Bjoern Peters

unread,
Apr 28, 2009, 12:44:10 PM4/28/09
to obi-denr...@googlegroups.com
I won't be able to make the call, as I got a new deadline this morning for a grant proposal due at noon. That is as much fun as it sounds. I hate to miss this call.

http://en.wikipedia.org/wiki/Tree_data_structure) . Presumably a DENRIE

Chris Stoeckert

unread,
Apr 28, 2009, 2:02:36 PM4/28/09
to obi-denr...@googlegroups.com
having trouble connecting - hope to be on soon!
Chris

James Malone

unread,
Apr 29, 2009, 7:18:27 AM4/29/09
to obi-denr...@googlegroups.com
Hi All,

I have fed back some of the discussion on parameter to the DT branch
mailing list to see if anyone has any thoughts. Will cross post if
anything important comes up. Thanks for discussion last night, was
useful.

James

>> two-dimensional map are represented as colors.

Chris Stoeckert

unread,
Apr 29, 2009, 11:57:41 AM4/29/09
to obi-denr...@googlegroups.com
Hi James,
I asked Elisabetta (former DTer and a mathematician by training) what
she thought a parameter was without giving any background first.
Her first example was the base of log used such as the 2 in Log base
2. Another examples was specifying the variables in a function such as
x in f(x).
I then filled her in on the discussion and she thought both points of
view had merit.
Two use cases for her are:
1. Setting the vaiues of software to be run (e.g., processing
microarray data for normalization and analysis) so that the software
can be run the same way on different data sets.
2. Often when reporting microarray data processing, the individual
steps are clumped together so that a bunch of things happen to the
data and you need to list the settings (parameters) used along the way
but their relation to the original input data set is unclear (or not
important). What you want to track is the input data set and the
resulting output data set. So these datasets seem different from
settings (parameters).

Cheers,
Chris

Reply all
Reply to author
Forward
0 new messages