JM: data transformation parameter

0 views
Skip to first unread message

Bjoern Peters

unread,
Mar 25, 2009, 1:31:57 PM3/25/09
to informatio...@googlegroups.com, obi-denr...@googlegroups.com
I am starting a new thread, to keep this discussion separate from the
'population characteristic' meaning of 'parameter', and propose to name
this one 'data transformation parameter'.

I believe examples for your 'data transformation parameters' are
- The integer k in 'k-means clustering'
- The window size in a 'moving average'
- The values for p, T, w, m in a 's transformation'

I believe the following is true: In every instance of a data
transformation process, the data transformation parameter is replaced by
a value (typically a number). That number is referred to as a data
transformation parameter, as a different number could have been used,
and it would have still been an instance of the same data
transformation. This means being a parameter is a 'role like' entity
attached to a number used in a process. As we are not allowed to use
roles for data (we need to follow up with Bary on that, dropped the ball
...), we should follow the 'specification' route for now.

Something like:

data transformation parameter specification
is_a 'information entity about a realizable'
is_concretized_as (is_realized by only data transformation)
is_about some (information_content entity participates_in some data
transformation)

(As always, I would have preferred a 'specifies' relation instead of the
is_about, but every time I raise that relation my emails get ignored :) )

- Bjoern


James Malone wrote:
> Hi Bjoern,
>
> Yes there are at least two uses of the word. The computer function
> parameter is quite close to the mathematical use. I can find several
> definitions on the net some better than others, the definitions
> further down this wiki page http://en.wikipedia.org/wiki/Parameter are
> ok for defining where they lie within a function but little more. The
> statistical one is more complicated as you've spelled out but we need
> both. So to push on, considering the mathematical/computer function
> use, one thing that seems to strike me is that for a data
> transformation, that a paremeter is a part of the process. If we say
> that a parameter mu is part of some equation involved which defines a
> data transformation, there are two things we need to say 1. mu is part
> of the equation (i.e. the DT) and 2. that mu is a paremeter. Does
> this help Alan?
>
> James
>

--
Bjoern Peters
Assistant Member
La Jolla Institute for Allergy and Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters

Alan Ruttenberg

unread,
Mar 26, 2009, 12:21:50 AM3/26/09
to informatio...@googlegroups.com, obi-denr...@googlegroups.com
On Wed, Mar 25, 2009 at 1:31 PM, Bjoern Peters <bpe...@liai.org> wrote:
>
> I am starting a new thread, to keep this discussion separate from the
> 'population characteristic' meaning of 'parameter', and propose to name
> this one 'data transformation parameter'.
>
> I believe examples for your 'data transformation parameters' are
> - The integer k in 'k-means clustering'
> - The window size in a 'moving average'
> - The values for p, T, w, m in a 's transformation'

That sounds like a coherent set of things.

> I believe the following is true: In every instance of a data
> transformation process, the data transformation parameter is replaced by
> a value (typically a number).

The parameter *is* the value? or is *replaced* by the value?

> That number is referred to as a data
> transformation parameter, as a different number could have been used,
> and it would have still been an instance of the same data
> transformation.

As it would be if some other specified input was changed.

> This means being a parameter is a 'role like' entity
> attached to a number used in a process. As we are not allowed to use
> roles for data (we need to follow up with Bary on that, dropped the ball
> ...), we should follow the 'specification' route for now.

> Something like:
>
> data transformation parameter specification
>    is_a 'information entity about a realizable'
>    is_concretized_as (is_realized by only data transformation)
>    is_about some (information_content entity participates_in some data transformation)

Not sure about this all. Mostly because I can't put my finger on how a
parameter like this functions differently than other data that is
input.

-Alan

Bjoern Peters

unread,
Mar 26, 2009, 9:07:45 AM3/26/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
Alan Ruttenberg wrote:
> On Wed, Mar 25, 2009 at 1:31 PM, Bjoern Peters <bpe...@liai.org> wrote:
>
>> I am starting a new thread, to keep this discussion separate from the
>> 'population characteristic' meaning of 'parameter', and propose to name
>> this one 'data transformation parameter'.
>>
>> I believe examples for your 'data transformation parameters' are
>> - The integer k in 'k-means clustering'
>> - The window size in a 'moving average'
>> - The values for p, T, w, m in a 's transformation'
>>
>
> That sounds like a coherent set of things.
>
>
>> I believe the following is true: In every instance of a data
>> transformation process, the data transformation parameter is replaced by
>> a value (typically a number).
>>
>
> The parameter *is* the value? or is *replaced* by the value?
>
>
I meant the parameter 'is' the value. 'Replaced' was in relation to a
formula describing in algorithm in which e.g. 'k' is used, while in

>> That number is referred to as a data
>> transformation parameter, as a different number could have been used,
>> and it would have still been an instance of the same data
>> transformation.
>>
>
> As it would be if some other specified input was changed.
>
>
>> This means being a parameter is a 'role like' entity
>> attached to a number used in a process. As we are not allowed to use
>> roles for data (we need to follow up with Bary on that, dropped the ball
>> ...), we should follow the 'specification' route for now.
>>
>
>
>> Something like:
>>
>> data transformation parameter specification
>> is_a 'information entity about a realizable'
>> is_concretized_as (is_realized by only data transformation)
>> is_about some (information_content entity participates_in some data transformation)
>>
>
> Not sure about this all. Mostly because I can't put my finger on how a
> parameter like this functions differently than other data that is
> input.
>
>
With the narrow definition of data (directly measured data or data
transformations thereof), the difference to a parameter is that data
retains an 'is about' to what was being measured. Parameters don't.

- Bjoern

James Malone

unread,
Mar 31, 2009, 5:50:23 AM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
Hi,

Bjoern's examples are good and represent some of the things we would
represent with this. So in order to push on with this, do you want me
to construct a definition and propose it? Send on some more examples?
Submit it to the tracker? I'd like to push on with this and not let
it die...

Thanks,

James

Alan Ruttenberg

unread,
Mar 31, 2009, 9:02:11 AM3/31/09
to informatio...@googlegroups.com, obi-denr...@googlegroups.com
On Tue, Mar 31, 2009 at 5:50 AM, James Malone <james....@gmail.com> wrote:
>
> Hi,
>
> Bjoern's examples are good and represent some of the things we would
> represent with this.  So in order to push on with this, do you want me
> to construct a definition and propose it?  Send on some more examples?
>  Submit it to the tracker?  I'd like to push on with this and not let
> it die...

Worth a go. I'm not sure Bjoern's distinction completely works,
though. For example consider the choice of window size in a moving
average, or other parameters related to smoothing. They often have
something to do with the domain and the noise characteristics of the
signal.

Even in k-means, often the choice of k is influenced by knowledge of
the domain or preprocessing of the data (and therefore retaining some
aboutness).

-Alan

Chris Stoeckert

unread,
Mar 31, 2009, 12:33:44 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
Hi James,
Would be great to see a definition - one that we can work into a set
of restrictions.
To start with would be great to establish whether dt parameter is a
type of data item, a type of specification, or a type of role. Wasn't
quite clear on that based on the email thread. My sense is that it is
a type of specification.

Thanks,
Chris

Alan Ruttenberg

unread,
Mar 31, 2009, 1:32:18 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
On Tue, Mar 31, 2009 at 12:33 PM, Chris Stoeckert
<stoe...@pcbi.upenn.edu> wrote:
>
> Hi James,
> Would be great to see a definition - one that we can work into a set of
> restrictions.
> To start with would be great to establish whether dt parameter is a type of
> data item, a type of specification, or a type of role. Wasn't quite clear on
> that based on the email thread. My sense is that it is a type of
> specification.

That's somewhat in the direction I was thinking as well, though more
on the objective side. Specifically I wonder if we wouldn't make
progress by having sub-objectives of data transformation related
objectives that provision of the parameter help satisfy.

For k in k-means it does indeed seem like a specification.
For a window for averaging, it can help achieve the objective of reducing noise.

Consider the case of RMA normalization. We've used a standard set of
arrays as a background to add a new array and have it normalized. Are
the standard set of arrays a parameter? (because they aren't about the
measurement at hand?)

-Alan

James Malone

unread,
Mar 31, 2009, 1:51:27 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
This sounds like it will get very messy very quickly and I think it's
over-modelling. I would prefer to leave the objective to the data
transformation process it is attached to, after all the objective of k
= 2 is to split the data in two only as part of the k-means
clustering. I agree with Chris, I'm not sure where it sits presently
but I would be inclined to leave this as either a role or a
specification. The only issue is that a parameter is only a parameter
in the context of the process it participates in, but perhaps that's
still ok for it to live under ICE.

Alan Ruttenberg

unread,
Mar 31, 2009, 2:24:33 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
On Tue, Mar 31, 2009 at 1:51 PM, James Malone <james....@gmail.com> wrote:
>
> This sounds like it will get very messy very quickly and I think it's
> over-modelling.  I would prefer to leave the objective to the data
> transformation process it is attached to, after all the objective of k
> = 2 is to split the data in two only as part of the k-means
> clustering.

The problem is the one with dealing with all information content
entities. How to prevent a slide into disconnection of the information
from the thing it represents. This isn't a matter of overmodeling,
it's a matter of keeping to a principle of clean hygiene in
representing information.

> I agree with Chris, I'm not sure where it sits presently
> but I would be inclined to leave this as either a role or a
> specification.  The only issue is that a parameter is only a parameter
> in the context of the process it participates in, but perhaps that's
> still ok for it to live under ICE.

Explain this last bit, if you would?

Thanks,
Alan

James Malone

unread,
Mar 31, 2009, 3:04:22 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
The k is only a paremeter in the context of k-means, it's just a
representaton of a specified number of k clusters. It only means
anything given the context of the process, I would say.

Alan Ruttenberg

unread,
Mar 31, 2009, 4:05:21 PM3/31/09
to informatio...@googlegroups.com, obi-denr...@googlegroups.com
On Tue, Mar 31, 2009 at 3:04 PM, James Malone <james....@gmail.com> wrote:
>
> The k is only a paremeter in the context of k-means, it's just a
> representaton of a specified number of k clusters.  It only means
> anything given the context of the process, I would say.

OK, I understand what you mean now.
Although if there are other clustering algorithms that take as input a
number of cluster then their parameter that sets this is related, no?

-Alan

Bjoern Peters

unread,
Mar 31, 2009, 10:24:46 PM3/31/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
I have been writing too long on this, so I will send it out as food for thought even though I don't think I am done.

I am extending here what I wrote in my original email to James. One thing to keep in mind is that there is no such thing as 'specification' in the OBI, but there is 'information entity about a realizable' which is the parent of 'plan specification' etc., so we have informally dubbed it 'specification'.

I agree with Alan that it can be hard to keep 'parameters' and specified data input and output apart. All are specified in the plan that is executed in the data transformation process. For all of them, we can identify some 'information content entity' that participates in the actual data transformation process. So I believe we can define:

data transformation participant specification


is_a 'information entity about a realizable'

part_of some (plan specification and has_part data transformation objective)
is_about some (information_content entity and participates_in some data transformation)

data transformation participant specification has at least three children:
1)data transformation input specification
2)data transformation output specification
3)data transformation parameter specification

For 1 and 2, we have been using the 'has_specified_information_input / output' relations, which point to 'data' items. We are not actually using 1 + 2. We could do the same for 3, and use a third relation 'has_specified_information_parameter', which would point to 'data transformation parameter' items. This would move the fact that these are specifications into the relation.

E.g:

information content entity
data transformation parameter
window averaging length
number of clusters k

data transformation parameter would be a defined class:
information content entity and is_specified_information_parameter_in some data transformation

I have to stop here...

- Bjoern

James Malone

unread,
Apr 1, 2009, 3:10:49 AM4/1/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
This all makes perfect sense to me.

James

Chris Stoeckert

unread,
Apr 1, 2009, 9:12:50 AM4/1/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
Agreed. Thanks Bjoern!
Chris

James Malone

unread,
Apr 2, 2009, 6:43:56 AM4/2/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
....continuing to push on this issue, can we consider Bjoern's as a
logical definition for parameter? I did not see anyone disagreeing.
If we can agree on this and agree to have it added I will speak with
the DT branch guys and try and get an agreeable English definition and
get back to you.

Cheers,

James

Bjoern Peters

unread,
Apr 2, 2009, 6:21:26 PM4/2/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
Obviously I am all for going forward. One minor edit to what I wrote
before: I would suggest to use 'data transformation parameter value' as
the parent, to separate it clearly from the 'specification'. That would
mean:

information content entity
data transformation parameter value


window averaging length
number of clusters k


It would be used like:

'moving average' has_specified_information_parameter 'window averaging
length'

and we should have a similar approach to identify the value of the
'window averaging length' as we are using for 'measurement datum', i.e.
a 'has_value' relation, possibly also a 'has unit' relation.


Draft definition:
A data transformation parameter value is a information content entity
which is a specified participant of a data transformation. It is
distinct from the data input and the data output of the data
transformation, in that the the latter refer to the data items which are
being transformed, the latter retain an aboutness relation to a measured
entity.

Melanie Courtot

unread,
Apr 2, 2009, 6:57:29 PM4/2/09
to obi-denr...@googlegroups.com, informatio...@googlegroups.com
I am trying to understand why we need the extra relation.
Would it be different to say that the number of clusters k is a data
transformation parameter specification, which concretization is
realized by the data transformation k-means?

Otherwise would we need to have extra relations for our variable
(dependent, independent, controlled) specifications too?

Melanie

---
Mélanie Courtot
TFL- BCCRC
675 West 10th Avenue
Vancouver, BC
V5Z 1L3, Canada


Reply all
Reply to author
Forward
0 new messages