known unknowns

0 views
Skip to first unread message

Bjoern Peters

unread,
Dec 3, 2009, 9:48:59 AM12/3/09
to Chris Mungall, Alan Ruttenberg, information-ontology
I would like to make a proposal on dealing with 'known unknowns' in IAO. This came up when dealing with the term 'unknown sex', which needed to be included in OBI for MO. In general, these come up a lot during data curation, when e.g. a web form asks to specify the type of cell used in an experiment, and also allows for the choice 'unknown'. I think it is straightforward to deal with these as information content entities, and put a proposal in the tracker `
http://code.google.com/p/information-artifact-ontology/issues/detail?id=72

Getting this right as a design pattern would be of high value for all OBO foundry ontologies.

- Bjoern


--
Bjoern Peters
Assistant Member
La Jolla Institute for Allergy and Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters

Michel Dumontier

unread,
Dec 3, 2009, 10:43:39 AM12/3/09
to Bjoern Peters, Chris Mungall, Alan Ruttenberg, information-ontology
Bjoern,
  Would it be sufficient to represent this undetermined attribute with an existential statement e.g. that there is a gender, but we are unable to commit to a more specific type?

-=Michel=-






--
Michel Dumontier
Associate Professor of Bioinformatics
Carleton University
http://dumontierlab.com

Alan Ruttenberg

unread,
Dec 3, 2009, 10:45:02 AM12/3/09
to Michel Dumontier, Bjoern Peters, Chris Mungall, information-ontology
There are a few issues here. First, I would say that "unknown sex" is
a specified output of some assay in which sex is attempted to be
determined. So it doesn't stand on it's own.

Second Unknown sex isn't a type of sex. It isn't a measurement datum.
In any treatment of the matter we need to make sure we don't confuse
what the status of such entities are.

Third, there is a question of what "unknown sex" is about. It is more
about the process than about the target (there is no quality of a
person that is "unknown".

I think that "unknown sex" is something one fills in a *form*. We
don't yet have a treatment of forms at the moment and so that's an
area I think we would need to develop in order to handle this
correctly.

Finally, I think "unknown sex" might be several things. It *might*
represent that there was the assay above and the results were lost. It
might mean that the sex doesn't correspond to any of the existing
categories. It might be added post-hoc for a data system which
requires the field to be filled, but in which there was no attempt to
actually determine the sex. In some of these cases the correct mapping
to OWL, given the open world assumption, is to not specify anything -
as OWL specifies that anything that is unsaid is unknown.

OK. end of quick thoughts.

-Alan

Bjoern Peters

unread,
Dec 3, 2009, 11:02:09 AM12/3/09
to Michel Dumontier, Chris Mungall, Alan Ruttenberg, information-ontology
Michel,

To clarify the example for this discussion: I am assuming there is a web database in which people can upload microarray data for tissue samples from patients, and are asked to describe the patient sex with a single drop down list including the choices male/female/unknown.

If I understand correctly you are referring to a statement like
'patient has_quality some biological sex'
which accurately represents what we know about an experiment in which a patient of  unkown sex was used.

My point was that there is a different piece of information, namely that the person who entered the data clarifies that he does not have any more information. That makes it a 'known unknown', which is different from e.g. the weight of that same patient which may not be recorded on the data entry form at all and may or may not be known to the experimenter.

I would suggest that selecting anything on the drop down menu creates an ICE that is about some biological sex and quality of some patient. If male/female is selected, the ICE is a 'data item and is_about some male/female sex and quality of some patient'. If 'unknown' is selected, it is not a 'data item', but a 'known unknown'.

- Bjoern

Bjoern Peters

unread,
Dec 3, 2009, 11:29:21 AM12/3/09
to Alan Ruttenberg, Chris Mungall, information-ontology, Michel Dumontier
Apparently I did not communicate well what I intended. Hopefully the email to Michel clarifies some of it. Very brief responses inline.

----- "Alan Ruttenberg" <alanrut...@gmail.com> wrote:
> There are a few issues here. First, I would say that "unknown sex" is
> a specified output of some assay in which sex is attempted to be
> determined. So it doesn't stand on it's own.
>
That is one reason why something can be unknown. There are others. That was discussed in the tracker item.

> Second Unknown sex isn't a type of sex. It isn't a measurement datum.
> In any treatment of the matter we need to make sure we don't confuse
> what the status of such entities are.
>
I never said that it was.

> Third, there is a question of what "unknown sex" is about. It is more
> about the process than about the target (there is no quality of a
> person that is "unknown".

As I thought I had written, it is about two things, the knowledge of somebody writing things down, and about the specific thing his knowledge is about (here the biological sex of a patient)

 
> I think that "unknown sex" is something one fills in a *form*.
> We don't yet have a treatment of forms at the moment and so that's an
> area I think we would need to develop in order to handle this
> correctly.

That was the intend of my proposal.

> Finally, I think "unknown sex" might be several things. It *might*
> represent that there was the assay above and the results were lost. It
> might mean that the sex doesn't correspond to any of the existing
> categories. It might be added post-hoc for a data system which
> requires the field to be filled, but in which there was no attempt to
> actually determine the sex. In some of these cases the correct mapping
> to OWL, given the open world assumption, is to not specify anything -
> as OWL specifies that anything that is unsaid is unknown.

I listed two of those as examples for unknown (attempts to determine something failed, and unclear if attempts were even made to determine something)

The category mismatch is not a kind of unknown, as the person entering the data knows something about the biological sex. The correct choice in a list fields would typically be 'other'. That could be dealt with similarly to 'unknown', and should be taken on as well, but is a different thing. The third kind of entry like this is 'not applicable', which again is different from unknown, and would for example apply if e.g. sample from something other than an organism is run, and it indicates that there is no 'biological sex' quality of the source of the sample.

- Bjoern
Reply all
Reply to author
Forward
0 new messages