On Aug 6, 2025, at 8:05 AM, ontolo...@googlegroups.com wrote:
- Reasons for not using user define IRIs? - 4 Updates
Michael DeBellis <mdebe...@gmail.com>: Aug 05 10:24AM -0700
Thanks, Alex, that is exactly what I was looking for! In general as I
review these I think the arguments are mostly an example of what Dawkins
calls The Tyranny of the Discontinuous Mind:
https://richarddawkins.com/articles/article/the-tyranny-of-the-discontinuous-mind
In this case the assumption that if you use intuitive IRIs you can't use
labels and vice versa.
Here are my replies to the Arguments for Alphanumeric Codes (OBO Foundry
Approach): https://claude.ai/share/4e077884-81dc-4866-81de-c0d6daedb5cc
Stability and Evolution: Alphanumeric codes provide stable identifiers that
> division" but later scientific understanding shows it should be
> "cytokinesis," the human-readable URI would need updating, potentially
> breaking existing references.
The unstated assumption is that Alphanumeric Codes are "more stable
identifiers that don't need to change when the understanding of a concept
evolves or when terminology needs refinement". How are alphanumeric codes
any more stable than user defined IRIs? What I've heard people argue is
that with an alphanumeric code you can just change the label and not change
the IRI. However, I would argue that a change where all you want is to
rename some entity and nothing else are fairly rare. In those rare cases
you could just as easily change the label and leave the IRI as it was and
add a comment that in this example the IRI doesn't map to the label. I've
done that several times. I would much rather have 90% of my labels and IRIs
have names that can be directly mapped to their prefLabel and 10% don't and
even for that 10% where there is a mismatch, the IRI still gives you some
idea what it is.
The example given: "you initially call something "cell division" but later
scientific understanding shows it should be "cytokinesis," is not a strong
argument because schema evolution is seldom this simple. However, even this
example isn't compelling because I would think that you still want to keep
"cell division" as an altLabel in this case, i.e., you can't just change
the label, even in many of the (already rare) paradigmatic examples. If
you are still doing design and the ontology hasn't been rolled out, it is
easy to change the IRI. But if you did do a roll out and you don't want to
change the IRI, you *can *simply change the label and add a comment
explaining that for this class the IRI doesn't map to the prefLabel.
More importantly, most of the time you aren't just changing the name but
you are doing more complex things like inserting a new class in between two
existing classes. Here's a real example: for an ontology I built recently I
used the Agent pattern. Agent is a superclass of Organization and Person.
However, I realized I needed an intermediate class between Agent and
Organization called Group (a Group is any collection of individuals with
one or more identifying trait but no formal structure, e.g. all Males in
the US is a Group, Climate_Social_Science_Network is an organization). So I
added a new class and changed the definitions so that Group is a sublcass
of Agent and Organization is a subclass of Group. But that wasn't
everything. Definitions (domain and range) of various properties needed to
change as well. E.g., has_member's domain was changed from Organization to
Group. In my experience most schema evolution is like this, you typically
change more than just the name of something, you also restructure the
ontology and need to change domain, range, and other axiom definitions so
you will be making changes at the level of IRIs anyway. If you roll out a
new version of your Cell ontology that still has Cell_Division as the IRI
but uses a different prefLabel but you also don't change things like the
domain and range of properties, then your code will break anyway. In this
example, if I assert a Group: Physic_Teacher_At_MIT has_member Alan_Adams
and don't update the domain of has_member then the reasoner will
incorrectly infer that Physic_Teacher_At_MIT is an Organization rather than
a Group.
Language Independence: Numeric codes avoid issues with natural language
> variations, translations, and cultural differences in terminology. This is
> crucial for international collaboration and multilingual applications.
How do codes "avoid issues with natural language"? You still have to define
language tags and you still need to define which language tag to use for
different users. Users won't see the IRI anyway. Choosing one language as
the language that a team standardizes on for names in no way ties you to
only using that language for your labels. Developers have been doing this
since the beginning of the digital computer. Following this logic, if you
want your Python system to support multiple languages then you would choose
names like:
variable_123 = conn.createURI("
> http://www.w3.org/2002/07/owl#NamedIndividual")
Rather than:
owl_named_individual = conn.createURI("
http://www.w3.org/2002/07/owl#NamedIndividual")
Uniqueness Guarantees: The OBO Foundry uses unique IDSPACE codes that
> identify each project, ensuring no conflicts between ontologies . Combined
> with systematic numbering, this prevents identifier collisions.
I would argue this is actually a reason you SHOULD use user defined names.
E.g., if I'm creating a class in Protege and I'm using user defined names,
then I will know when I'm creating a class that already has a given IRI.
Again, this happened to me recently. I realized that I was trying to define
a class with IRI "Communication" but I already had a property with that
IRI. I needed IRIs such as Communication_Event and
Greenwashing_Communication_Event. If I were using OBO I might not find the
problem until much later downstream and of course the later you find a
problem (developer time vs. compile time vs. run time) the more expensive
it is too fix. The fact that I can have two different IRIs with the same
label is actually an argument to NOT use codes. Also, there already is a
mechanism in OWL (and every modern programming language) for resolving name
conflicts: namespaces.
Technical Robustness: Alphanumeric codes avoid issues with special
> characters, spaces, encoding problems, and URL-unsafe characters that can
> occur with natural language terms.
Another strawman. I use User Defined names but I am always very rigorous in
only using the basic alphabet and no spaces, colons, slashes, etc. Even
though the IRI spec supports most special characters, I've found using such
characters causes no end of headaches, especially when moving the same file
across different tools such as Protege, Stardog, and AllegroGraph. Again,
the 90/10 rule: would rather have 90% of my IRIs that can automatically
synch with the prefLabel and 10% that don't rather than 100% that don't.
Separation of Concerns: The identifier serves purely as a stable
> reference, while human-readable labels are handled through annotation
> properties (like rdfs:label). This allows multiple labels, synonyms, and
> translations without affecting the core identifier.
Another strawman. It assumes (and I see this all the time, including alas
Protege) that the choice is between user define IRIs and using labels.
There is no reason you can't use both, which is what I and most of the
developers I know do. I use user defined IRIs for developers and labels
for pretty names that are relevant to end users. What I do when I'm not
constrained by some standard, is to use English IRIs with underscores for
blanks and with only basic alphabetic characters and numbers. That way, I
can use a very simple SPARQL transformation to auto-generate most of the
initial labels.
This is anecdotal of course, but in the last few years I've worked with
ontologies that use user defined names and codes (which is one reason I
wanted to revisit this) and I find codes add extra work for the developer,
are harder to debug, and I've never seen any example of schema evolution
where I think "this would be easier with codes as IRIs" or any other use
case where using codes would have simplified things.
Also, my final argument, is that the onus of proof here should be on those
who argue for codes. Clearly there is a cost to using codes or UUIDS. From
my initial past, clearly having to write SPARQL like:
SELECT ?p ?r
> WHERE {?p a codo:Patient;
> codo:hasFamilyRelationship ?r.}
is better than:
SELECT ?p ?r
> WHERE {?p a codo:OWLClass_f861e81c_661a_4243_a9be_cb9c780cb78a;
> codo:OWLProperty_c744v9fv_594j_3640_a9be_dge5305fe45v ?r }
And in my work over the last few years, I've never seen any use cases where
codes made thing easier. Also, this isn't the only cost to using codes.
Codes mean your team has to depend on some central authority to give you a
range of codes that haven't been used yet. Codes (this for me is one of the
clearest and most costly) make it a lot harder to debug your software.
Michael
https://www.michaeldebellis.com/blog
On Tue, Aug 5, 2025 at 1:59 AM Alex Shkotin <alex.s...@gmail.com> wrote:
Damion,
I suggest thinking about why it was necessary to encode terms at all?
On the one hand, we rewrite our knowledge in some formal language and it turns out that our terms cannot be used as identifiers of formal objects there.
And then it turns out that if we have to write a lot of formulas during the operation of the system, for example, formal random queries, then writers will require creating abbreviations or a hint tool when writing a query.
On the other hand, it is possible that in our subject area, different people initially use different words to indicate the same referent.
Well, technologically, we have several solutions for encoding terms:
- as close as possible to the terms themselves.
- abbreviations, when "cd" is used instead of "cell division"!
- a single center for issuing codes.
Perhaps there are some other algorithms.
And "Term reuse" is sometimes called polymorphism. Why do you mention "homologues"? biologically? according to Claude. 😀
Alex
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/D54B4CF9-79C1-4B73-92B7-B931D8E437DE%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/ontolog-forum/CAFxxRORwLtNG_k0q3uAbRyb%3Dy1pTd3oskH4oZr0%2Bxrx%3DjxW1WA%40mail.gmail.com.
John,
We are mainly talking about formal ontologies here, usually for various sciences and technologies, but sometimes for everyday life, i.e. knowledge known as common sense knowledge.
Scientific and technological jargon is huge. For example, how many names do we have for drugs or materials?
By the way, another algorithm to code compound terms is by brackets: "cell division" → (cell)division i.e. we apply "division" to "cell" but this is a HOL.
Alex