list of core terms

2 views
Skip to first unread message

Chris Stoeckert

unread,
Dec 17, 2008, 2:37:40 PM12/17/08
to obi-denr...@googlegroups.com
Thanks everyone for emails and the discussion. Here is the list we
ended up with.
1. datum/ data; dataset; data structure
2. narrative object - textual
3. narrative object - figure
4. software
5. variable - could be plan
6. hypothesis
7. result
8. conclusion
9. objective specification - research objective
10. study design - technical ap[pproach
11. value/ unit

Next steps are to post these at the Wiki page for branch core terms
and to start an email discussion on each with regard to status,
issues, and proposals.

Cheers,
Chris

Chris Stoeckert

unread,
Jan 6, 2009, 4:12:29 PM1/6/09
to obi-denr...@googlegroups.com
Hi,
As mentioned previously, I'd like to get discussions going on the other items as well. Let's move on to narrative object - textual and figure.

Currently narrative object is an information content entity that is a set of propositions. It covers both textual narrative objects (e.g., report of results) and non-textual (i.e., figures and tables) narrative objects (e.g., report display element). The current structure was found to be confusing at our call last month with the recommendation to split these out. Perhaps something like:

Narrative text object would then refer to a set of propositions (such as a description of events) that can be read or spoken. 
Figures are constructions of images for the purpose of describing or illustrating something. Figures may include text but can not typically be read or spoken as a whole. 

An Issue with this dichotomy is that it doesn't allow for journal articles which are a combination of both text and figures (and tables).  So we need a way to both say that something is a report of some kind (journal article, diagnosis, patent application, etc) and the representational parts of the report (text, figures, tables). Maybe these need to be siblings:

Report  is a combination of text and figures to describe a set of events.
Text is physical representation of some spoken language
Figure uses an image or images for the purpose of describing or illustrating something

Cheers,
Chris

Begin forwarded message:

Melanie Courtot

unread,
Jan 6, 2009, 7:57:20 PM1/6/09
to obi-denr...@googlegroups.com
Hi Chris,

Following the data item discussion, I've been thinking about separating the information type and its representation. In this case having things like narrative object (report, conclusion, diagnosis, grant application, patent...) in one hierarchy, and having "encoding/structure" in an other, things like text, image, table, data format specification etc, bridging the 2 hierarchies with a relation like "encoded_in/represented_by". 

I am also thinking that this may help for the digital quality hierarchy, we could have a digital encoding, and for example an eMedical record (defined class) would be a record (hierarchy 1) encoded_in digital form (hierarchy 2)

As you can see from the description it's still pretty vague, but I'm wondering if you think it might be applicable?

Cheers,
Melanie

---
Mélanie Courtot
TFL- BCCRC
675 West 10th Avenue
Vancouver, BC
V5Z 1L3, Canada




Allyson Lister

unread,
Jan 7, 2009, 4:53:35 AM1/7/09
to obi-denr...@googlegroups.com
Hi Melanie, Chris,

If we separate out the type and its representation, then would we even need to say "narrative object" as something describing something that can be read or spoken? Wouldn't the link between the type and its representation allow an inference of whether or not something is a narrative object or a figure, for example? It's hard to explain what I mean without an example, so here goes, and apologies if I am unclear (and I'm just making up class names etc)!

myReport encoded_in textFormat
myFigure encoded_in tabularFormat

then you could have narrative object and figure object (or whatever they'll be called) as defined classes with appropriate rules saying which instances/classes would be inferred to be either narrative or figures depending on what format they're encoded in. Does this make sense?

:)

2009/1/7 Melanie Courtot <mcou...@gmail.com>



--
Thanks,
Allyson :)

Allyson Lister
Research Associate
Centre for Integrated Systems Biology for Ageing and Nutrition
Newcastle University
http://www.cisban.ac.uk
School of Computing Science
Newcastle University
Newcastle upon Tyne, NE1 7RU

Frank Gibson

unread,
Jan 7, 2009, 5:49:02 AM1/7/09
to obi-denr...@googlegroups.com
Hi,

Just to chip in :) I think I agree with what Melanie and Allyson are
suggesting. I believe it makes sense to separate the information and
its encoding, as the same _information_ could be encoded by multiple
encodings. It would also appear to make sense at this stage to think
of narrative object as a defined class as Ally suggests.
To continue Ally's example you could take it further and defined a
particular schema for the report, such as,

MyReport encoded_in (textFormat and has_schema BBSRC_project_report_schema)


In that the BBSRC_project_report_schema can define the text formating
such as chapter headings/references styles etc. This is just a quick
example and I have not thought through distinction such as the schema
for XML specification and an XML_schema for a particular XML_document
i.e MAGE, if there is a distinction or not.

Frank

--
Frank Gibson, PhD
http://peanutbutter.wordpress.com/

Chris Stoeckert

unread,
Jan 7, 2009, 11:01:37 AM1/7/09
to obi-denr...@googlegroups.com
I agree too.
However, what would MyReport be an instance of?
Other core terms include conclusion, result, hypothesis which could be
considered types of information content. But these seem different from
a report or journal article.
Here are things that seem like each other:
- report, journal article, patent
- hypothesis, conclusion, result
- figure, text, table
- pdf, html page

Also, where does software go? seems like it would fit more with report
than the others.

Chris

Chris Stoeckert

unread,
Jan 13, 2009, 4:40:09 PM1/13/09
to obi-denr...@googlegroups.com
More thoughts on these to try and make some progress on these core
terms.

MyReport is an instance of report.
A report is an information content entity that is a statement of the
results of an investigation, or of any matter on which definite
information is required, made by some person or body instructed or
required to do so.
Types of reports include patent content.
A patent content is a report for the purpose of obtaining a license
from a government conferring for a set period the sole right to make,
use, or sell some process or invention.

Reports have parts that are statements about intent and
interpretations of investigations.
Conclusion is a part of a report that is a statement about the outcome
of an investigation.
Then narrative objects could be reports and parts of reports.

Report parts can be encoded in different formats: text format, figure
format, table format

So adjusting the current hierarchy, this would result in:
information content entity
information entity about a realizable
data format specification
report format specification
text format
narrative object
report
patent content
part of a report
conclusion

Cheers,
Chris

Melanie Courtot

unread,
Jan 13, 2009, 4:52:55 PM1/13/09
to obi-denr...@googlegroups.com

On 13-Jan-09, at 1:40 PM, Chris Stoeckert wrote:

>
> More thoughts on these to try and make some progress on these core
> terms.

Thanks Chris :)

>
>
> MyReport is an instance of report.
> A report is an information content entity that is a statement of the
> results of an investigation, or of any matter on which definite
> information is required, made by some person or body instructed or
> required to do so.
> Types of reports include patent content.
> A patent content is a report for the purpose of obtaining a license
> from a government conferring for a set period the sole right to
> make, use, or sell some process or invention.
>
> Reports have parts that are statements about intent and
> interpretations of investigations.
> Conclusion is a part of a report that is a statement about the
> outcome of an investigation.
> Then narrative objects could be reports and parts of reports.
>

I would keep your proposal, removing "part of report", saying just
that conclusion is a narrative object. I don't think we always need to
have a full report when we are making a conclusion?

> Report parts can be encoded in different formats: text format,
> figure format, table format

I would keep text format directly under data format specification, and
suggest not adding report format specification. The text format can be
used to encode my protocol for example and so on.

So that would give something like:


information content entity
information entity about a realizable
data format specification

text format
narrative object
report
patent content

conclusion

Actually, current data format are instances of data format
specification, so text would probably deserve the same treatment. I
remember there was a discussion about instances vs subclasses, but I'm
afraid I forgot the resolution. Could somebody refresh my memory?

Thanks,
Melanie

Bjoern Peters

unread,
Jan 16, 2009, 4:44:48 PM1/16/09
to obi-denr...@googlegroups.com
I like this so far. Here is my input, which definitely can be further
improved.

I really liked this part of Chris email:

Here are things that seem like each other:

1) report, journal article, patent
2) hypothesis, conclusion, result
3) figure, text, table
4) pdf, html page

I would translate these into categories as follows:

1 = purpose of information for the author and audience
2 = relation of information to reality, according to the author
3 = format of information, as seen by humans
4 = format of information, as seen by computers


With that I would draw the hierarchy as follows (slightly different from
the other proposals)

information content entity
report (1)
journal article
patent application
grant progress report
patient record
proposition (2)
conclusion
hypothesis
result

information entity about a realizable

human format specification (3)
text format
figure format
audio format
table format
digital format specification (4)
html spec
pdf spec

Note that would be combined. There can be a patent application encoded
according to the pdf spec, which has a part describing a result
formatted as a Figure.

- Bjoern


--
Bjoern Peters
Assistant Member
La Jolla Institute for Allergy and Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters

Melanie Courtot

unread,
Jan 17, 2009, 10:49:58 PM1/17/09
to obi-denr...@googlegroups.com

On 16-Jan-09, at 1:44 PM, Bjoern Peters wrote:

>
> I like this so far. Here is my input, which definitely can be
> further improved.
>
> I really liked this part of Chris email:
>
> Here are things that seem like each other:
> 1) report, journal article, patent
> 2) hypothesis, conclusion, result
> 3) figure, text, table
> 4) pdf, html page
>
> I would translate these into categories as follows:
>
> 1 = purpose of information for the author and audience
> 2 = relation of information to reality, according to the author
> 3 = format of information, as seen by humans
> 4 = format of information, as seen by computers
>

I like the report (1) and proposition (2) part.

>
> With that I would draw the hierarchy as follows (slightly different
> from the other proposals)
>
> information content entity
> report (1)
> journal article
> patent application
> grant progress report
> patient record
> proposition (2)
> conclusion
> hypothesis
> result

I think we all agree on the above proposal? Maybe we should start
working on their definitions, starting with report and proposition?

In order to get us started ( and knowing they probably need more
work ;) ):

1. report (would replace current narrative object)

A report is an ICE describing findings of some individual or group. It
usually is made up of a set of propositions, and can be written,
spoken or encoded digitally. Its content is reflective of an
investigation, and it's purpose is to inform an audience.

(from http://www.google.ca/search?q=define%3Areport, Bjoern, and
current definition of narrative object)

2. proposition

A proposition is an ICE that affirms or denies something and is either
true or false. A proposition represents the relation of the
information to reality according to the author.
alternative term: statement

(from http://www.google.ca/search?q=define%3Aproposition and Bjoern)

(side note: those 2 terms would probably go to IAO, and OBI would deal
with specific subclasses.)

> information entity about a realizable
> human format specification (3)
> text format
> figure format
> audio format
> table format
> digital format specification (4)
> html spec
> pdf spec

>
> Note that would be combined. There can be a patent application
> encoded according to the pdf spec, which has a part describing a
> result formatted as a Figure.

I'm not sure we need to split 3 and 4.
The advantage of splitting them is that if we were to have the 2
relations (labels t be discussed of course) encoded_according_to and
formatted_as we could have clear ranges (report /proposition
encoded_as digital format specification and report/proposition
formatted as human format specification) I am not sure we need both
relations.

I don't have a strong opinion about that and I don't think it should
hold us back.

Other thing which I think is related and we may want to address at the
same time:
I am also wondering about representing structure of the files: we had
a discussion with Richard about how to say what columns in file would
represent (similar to what Pierre asked at http://groups.google.com/group/information-ontology/browse_thread/thread/ffcc3981960dc04b)
. Any suggestion how we would deal with "column" for example? Would we
say a table format has column but not text? I guess it depends on the
definitions we choose for text format (I assume we don't mean .txt
files here, but "unstructured written document")

Thanks,
Melanie

Chris Stoeckert

unread,
Jan 18, 2009, 11:57:46 AM1/18/09
to obi-denr...@googlegroups.com
Hi Melanie,

> I think we all agree on the above proposal? Maybe we should start
> working on their definitions, starting with report and proposition?
Woo hoo!

I'm fine with your definitions for report and proposition.
I do favor splitting the types of format specifications as Bjoern
suggests.

Thanks,
Chris

Frank Gibson

unread,
Jan 19, 2009, 4:39:58 PM1/19/09
to obi-denr...@googlegroups.com
I would agree with Melanie that I don't think splitting 3 and 4 is
necessary as it conflates the how the format is used with what the
format is. I can read html just as easy as I can read text, in fact
html is text. There is going to be a whole lot of pain and multiple
inheritance if you insist in splitting these.

Frank

Bjoern Peters

unread,
Jan 20, 2009, 11:21:11 AM1/20/09
to obi-denr...@googlegroups.com
I wanted to differentiate between the format in which a human 'takes up'
the information and the format in which the information is stored for
the computer. Human readable text can be stored as a pdf, jpg or html
file. Tables can be encoded in .pdf, .html. Figures can be encoded in
word, jpg, pdf.

The html source code may be text, but an html document viewed by a
browser is not (e.g. only setting background color = pink isn't much of
a read).

- Bjoern

Frank Gibson

unread,
Jan 20, 2009, 11:26:35 AM1/20/09
to obi-denr...@googlegroups.com
I would not recommend this approach, concentrate on the is_a rather
than what_it_could_be_used_for_under_certain_conditions

Frank

Bjoern Peters

unread,
Jan 20, 2009, 7:00:28 PM1/20/09
to obi-denr...@googlegroups.com
What is_a Figure then? I need to refer to Figure X in Journal article Y,
and don't care if that is in the printout on my table, the webpage
displayed on the journal website or the pdf file.

Frank Gibson

unread,
Jan 21, 2009, 5:20:18 AM1/21/09
to obi-denr...@googlegroups.com
On Wed, Jan 21, 2009 at 12:00 AM, Bjoern Peters <bpe...@liai.org> wrote:
>
> What is_a Figure then? I need to refer to Figure X in Journal article Y, and
> don't care if that is in the printout on my table, the webpage displayed on
> the journal website or the pdf file.

If you are talking about a figure in publishing terms then

A floating block, also called a figure, in writing and publishing is
any graphic, text, table or other representation that is unaligned
from the main flow of text. Use of floating blocks to present pictures
and tables is a typical feature of academic writing, including
scientific and articles and books. Floating blocks are normally
labeled with a caption or title that describes its contents and a
number that is used to refer to the figure from the main text. A
common system divides floating block into two separately numbered
series, labeled figure (for pictures, diagrams, plots, etc.) and
table.

Frank

Bjoern Peters

unread,
Jan 21, 2009, 12:29:03 PM1/21/09
to obi-denr...@googlegroups.com
So do you now agree that there is a difference between what an
information entity is as you say 'in publishing terms' (figure, table,
diagram, plot, picture, free text), which is what I meant by 'as seen by
humans', and how an information entity is encoded for the computer
(html, pdf, word doc, ascii file)?

Melanie Courtot

unread,
Jan 21, 2009, 2:37:37 PM1/21/09
to obi-denr...@googlegroups.com
Hi,

I see the distinction - for my part I was confused by the "as seen by
human" and "as seen by computer".

I think "as seen by computer" is what we are currently calling data
format specification, i.e. the file format, encoding.

The "as seen by human" is equivalent to "a visual representation (of
an object or scene or person or abstraction) produced on a surface"
and may possibly go with other graph terms (like diagram etc)
Currently in IAO (see http://code.google.com/p/information-artifact-ontology/issues/detail?id=8&can=1)
:

-report figure
--- report graph
------ Venn diagram
------ survival curve
------ heatmap
------ histogram
------ dendrogram
------ scatterplot
------ dot plot
------ contour plot
------ density plot

report figure: A report figure is a report display element that has
some aspect of illustration, but may be a composite of figures,
images, and other elements
report graph: A report graph is a report figure that presents one or
more tuples of information my mapping those tuples in to a two
dimensional space in a non arbitrary way.

Result would be: (with the numbers as initially assigned by Bjoern)
- information content entity
---- report (1)
------------ journal article
------------ patent application
------------ grant progress report
------------ patient record
---- proposition (2)
------------ conclusion
------------ hypothesis
------------ result
---- information entity about a realizable
------------ report figure (3)
------------------ report graph
---------------------- dot plot
---------------------- [..]
------------------- text format
------------------- figure format (there is an editor note to report
figure: "I prepended the 'report ' to make it clear that we mean parts
of reports here. We may want a more generic version of 'figure', in
which case this would become a defined class - figure and part_of some
report")
------------------- audio format
------------------- table format
------------ data format specification (4)
html spec
pdf spec

(note: html, pdf, xml are instances of data format specification - I
still don't remember why we decided instances and not subclasses :) )

How does that look?

Melanie

Frank Gibson

unread,
Jan 22, 2009, 7:08:48 AM1/22/09
to obi-denr...@googlegroups.com
I see this as a difference between what it is and what it is used for.
We are building a hierarchy of is_a not what it is used for. If you
really need to say this, which I do not believe you do, then
the"interpreted_by_humans" is a defined class of which the information
specifications can be infered under - not part of the single asserted
hierarchy

Frank

>>>>>>>>>>>>>>>> "encoded_in/represented_by".

Bjoern Peters

unread,
Jan 22, 2009, 11:20:26 AM1/22/09
to obi-denr...@googlegroups.com
I have no idea what Frank is suggesting.

Melanie: Yes, the IAO terms clearly belong there. How about 'report
element format' instead of 'report figure' as the label for the upper
class? audio format at least does not fit well into this. For the same
reason, it cannot be limited to visual, and 'on the surface. We can

'report element format' is a format in which information is presented
and consumed by a human being. that consumes the information

report element format
figure format

------ Venn diagram
------ survival curve (? too content specific? Could be x-y line graph)


------ heatmap
------ histogram
------ dendrogram
------ scatterplot
------ dot plot
------ contour plot
------ density plot

table format
text format
audio format
movie format

Needs some more work (keep 'format' or not, clarify relations).

- Bjoern.

Frank Gibson

unread,
Jan 23, 2009, 9:50:10 AM1/23/09
to obi-denr...@googlegroups.com
On Thu, Jan 22, 2009 at 4:20 PM, Bjoern Peters <bpe...@liai.org> wrote:
>
> I have no idea what Frank is suggesting.

Don not take the consumed by human approach - only deal with the
specification of what it is, not what its used for.

Frank

Reply all
Reply to author
Forward
0 new messages