Thanks for your interest in the IAO. You raise reasonable questions,
and, since I am the author of most of the terms you question, I offer
my thoughts on them below.
On Oct 20, 2009, at 1:21 AM, P. Def wrote:
> Firstly, according to the definition of the textual_entity ("a
> document as a whole is not typically a textual entity, because it
> has pictures in it - rather there are parts of it that are textual
> entities"), it seems to me as if the document as a whole could be
> regarded as a superclass which may contain a series of
> document_parts, which in turn may or may not contain a series of
> textual_entities according to their type. So I was wondering what is
> the reason it has been decided to put these 3 entities on the same
> hierarchical level, even though a textual entity can always be
> regarded as part of a document (ie. a document_part), which
> according to the principle of granularity is in itself ultimately a
> document, so it would seem that the is_a relationship would stand ?
Not all textual entities are part of a document. A fragmentary note,
say, is a textual entity that is not part of any larger whole.
The hierarchical level of an entity in a OBO ontology is not, in
itself, meaningful. The subsumption hierarchy may well be deeper (or
more deeply specified) for some branches than others. There is no
implication that the relationship "at the same depth in the tree" is
meaningful in any way.
> Secondly, I was wondering why are entities such as symbol, data
> item, label, and directive_information_entity on the same
> hierarchical level as the textual_entity as opposed to being a
> subclass thereof?
Textual entity and friends are relatively new additions to the IAO
compared to these above, and there may well be errors in the
subsumption hierarchy with respect to them.
Symbol is a problematic term that will probably have to remain
primitive. Intuitively, there are symbols that are not
textual_entities (such as a standardized image, perhaps the symbol for
a resistor in a circuit diagram). Data items, which could include
images, are not solely textual_entities. Label strikes me as possible
error; I think all labels have to be textual; can anyone think of a
counterexample? Directive_information_entity could be a diagram (such
as the ones you frequently get to explain the installation of a
technological device) and so is probably not a subclass of
textual_entity.
> Thirdly, considering the sub-classes of a textual_entity, I am
> confused as to why it has been decided that it should "live at the
> FRBR manifestation level", in that this would necessarily require
> all the sub-classes to live at the FRBR manifestation level as well,
> which seems slightly inconvenient to me, since entities such as
> citation, author identification, document title, etc appear to be
> more like a content-related kind of entities (as opposed to a layout-
> oriented kind) and, to my view, they should thus rather live at the
> FRBR expression level rather than the manifestation level.
This was a pragmatic decision so that we could have a clear criterion
for determining equality (and hence counts) of textual_entities. I
think this is only a slight inconvenience for the use cases you
mention, since it would be straightforward to state, for example, that
several citations (different manifestations) are all about the same
document. It was very hard to determine in the general case when two
FRBR expressions were the same (e.g. the HTML vs. PDF versions of a
document might contain different information). It is also the case
that even two identical manifestations might not be about the same
entity (consider distinct authors with a common name, or two different
documents that have the same title). If there were a use case where
FRBR expression level statements were clearly needed (and could be
clearly defined), I think we would consider adding IAO terms that
modeled them. Can you provide such an example and definitions? Any
definition should be specific enough to determine exactly how many
expressions there are in any situation.
> Besides, assuming that a textual_entity cannot be considered as a
> subclass of a document or a document_part, then it would have more
> sense to me to have the textual_entity and all its subclasses exist
> at the level of the FRBR expression, while having instead the
> document entity and all of its subclasses exist at the level of the
> FRBR Manifestation, thereby allowing for a textual_entity to acquire
> a format or a layout only once it has been incorporated into a
> particular document.
Determining equality among expressions (and, hence, the ability to
count the number of expressions in some instance) is very difficult.
Often, the formating and layout convey information (e.g. an italicized
vs. roman gene name often indicates species or gene vs. gene
product). How could you tell what expressions were the same without
knowing the layout or format?
> Finally, the concept of a narrative_object (IAO_0000006) defined as:
> "A narrative object is an information content entity that is a set
> of propositions, e.g. reports, journal articles, and patents
> submission." is very confusing to me, in that the definition seems
> to coincide with that of "a document of literary nature" and seems
> therefore to qualify as an entity/class that should theoretically
> subsists as a subclass of the document_part (and eventually as a
> superclass of the textual_entities it incorporates?)
>
> And likewise, it would seem natural to me that the report and the
> study_interpretation entities, which are fundamentally a more
> specific definition of a narrative_object, should ultimately subsist
> as a sub-class of the narrative_object, with the report_element
> entity being a subclass of the report class? but please correct me
> if I'm wrong.
Narrative_object, report and report_element are older terms that are
intended to be replaced by the document, textual_entity, etc. terms.
They have not been obsoleted yet because of dependencies in other
ontologies (e.g. in the OBI) whose update needs to be coordinated.
When that occurs, study_interpretation will become a subclass of
document.
> Again, I am just starting to learn about the IAO. Please do not
> regard these questions as any kind of criticism, but only as an
> attempt to better understand the underlying structure of the IAO.
Your questions and comments are much appreciated. IAO will meed the
needs of the community only to the extent the community participates
in its creation and maintenance. Your engagement in this process is
very welcome.
Larry
On Oct 20, 2009, at 1:21 AM, P. Def wrote:
Thirdly, considering the sub-classes of a textual_entity, I am confused as to why it has been decided that it should "live at the FRBR manifestation level", in that this would necessarily require all the sub-classes to live at the FRBR manifestation level as well, which seems slightly inconvenient to me, since entities such as citation, author identification, document title, etc appear to be more like a content-related kind of entities (as opposed to a layout-oriented kind) and, to my view, they should thus rather live at the FRBR expression level rather than the manifestation level.
This was a pragmatic decision so that we could have a clear criterion for determining equality (and hence counts) of textual_entities. I think this is only a slight inconvenience for the use cases you mention, since it would be straightforward to state, for example, that several citations (different manifestations) are all about the same document. It was very hard to determine in the general case when two FRBR expressions were the same (e.g. the HTML vs. PDF versions of a document might contain different information). It is also the case that even two identical manifestations might not be about the same entity (consider distinct authors with a common name, or two different documents that have the same title). If there were a use case where FRBR expression level statements were clearly needed (and could be clearly defined), I think we would consider adding IAO terms that modeled them. Can you provide such an example and definitions? Any definition should be specific enough to determine exactly how many expressions there are in any situation.
Determining equality among expressions (and, hence, the ability to count the number of expressions in some instance) is very difficult. Often, the formating and layout convey information (e.g. an italicized vs. roman gene name often indicates species or gene vs. gene product). How could you tell what expressions were the same without knowing the layout or format?Besides, assuming that a textual_entity cannot be considered as a subclass of a document or a document_part, then it would have more sense to me to have the textual_entity and all its subclasses exist at the level of the FRBR expression, while having instead the document entity and all of its subclasses exist at the level of the FRBR Manifestation, thereby allowing for a textual_entity to acquire a format or a layout only once it has been incorporated into a particular document.