Modeling a document as a collection of paragraphs

11 views
Skip to first unread message

Steve Wartik

unread,
Sep 7, 2022, 6:45:48 PM9/7/22
to bfo-d...@googlegroups.com
What is the best practice for modeling a document as a collection of paragraphs?

My question is really more specific. I am creating a knowledge base that can contain classified documents. I want to express the classification level of each paragraph. I assume classification level is a realizable entity.

I can think of two approaches. The first is to model the entire document, and each paragraph within the document, as independent continuants. The paragraphs are parts of the document.

The second approach is to model the entire document as an independent continuant that is the carrier of a generically dependent continuant. Other generically dependent continuants, each corresponding to a paragraph, are parts of this generically dependent continuant.

Conceptually, I prefer the second approach. The decomposition of a document into paragraphs is a human interpretation of the document’s content, not a set of physical relationships in the same way that gears are part of a clock. The parts of a paper document are pages. A digital file doesn’t necessarily have parts that are recognizable as paragraphs.

However, if I use the second approach, I do not see how to express the classification level. The documentation for the bearer-of property (RO_0000053) states that the bearer is an independent continuant. That would preclude expressing the classification level of a paragraph that’s a generically dependent continuant. (The documentation to which I refer is in the OWL version of BFO.)

So: what is the best practice for modeling a document as a collection of paragraphs?

Chris Mungall

unread,
Sep 7, 2022, 7:51:11 PM9/7/22
to bfo-d...@googlegroups.com
I have no opinions on this modeling question but just wanted to clarify something

The documentation for the bearer-of property (RO_0000053) states that the bearer is an independent continuant.


This is in RO, not BFO. It is a generalization of the BFO concept of bearer-of that allows processes and other entities to have characteristics

--
You received this message because you are subscribed to the Google Groups "BFO Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bfo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bfo-discuss/263367013.431532.1662590744299%40mail.yahoo.com.

Bill Duncan

unread,
Sep 8, 2022, 7:26:01 AM9/8/22
to bfo-d...@googlegroups.com
IAO has a high level 'document part' (subtype of information content entity (ICE)) term:  

This seems to me to be the best parent term for 'paragraph' (i.e., paragraph is a type of ICE).

As for defining the paragraph as being classified, I would say the classification level is a quality: the classification level is always present/manifested, not just present/manifested when participating in a process (e.g., reading the information). There is also an argument for the classification level being a type of role, but I think the quality approach is more straightforward.

As for relating the classification quality to the paragraph, there are multiple options:

- Follow @Chris' suggestion to use the 'has characteristic' relation. This could be between the paragraph ICE and the classification quality.

- Create appropriate subtypes of IAO 'material information bearer' (http://purl.obolibrary.org/obo/IAO_0000178) that concretize the paragraph ICE and bear the classification quality.

Hope this helps,
Bill

Reply all
Reply to author
Forward
0 new messages