All URIs are beautiful, it's just that some of them cause problems in VB

74 views
Skip to first unread message

Joeli Takala

unread,
Feb 20, 2026, 4:26:33 AMFeb 20
to vocbench-user
Hi,

Sometimes I run into problems where valid but slightly unusual URIs are not parsed completely by different LOD software. Today my issue has been with URN references wrapped in URL in VB+SemanticTurkey+GraphDB. If that sounds strange, referencable URNs were supposed to be a thing in the Internet using a resolver layer caller URC that would work like a DNS, in a way, but this was dropped (by accident?) when they were building the damn thing. So we have a work-around for the missing URC by using a national URN resolver that understands URIs like:

http://urn.fi/URN:NBN:fi:au:mts:m4958

And this is in line with the 2005 spec for URIs, see examples in 1.1.2 https://www.rfc-editor.org/rfc/rfc3986.txt (Or just paste the URI in a validator, like https://0mg.github.io/tools/uri/ )

But then the problems start. If I set up a vocabulary project with http://urn.fi/URN:NBN:fi:au:mts: as the base URI, VB will save it in project.info as:
 - Base URI http://urn.fi/URN:NBN:fi:au:mts:
- Default namespace http://urn.fi/URN:NBN:fi:au:mts:#
- VB GUI will show concept namespace/local name for the concept view as http://urn.fi/ | URN:NBN:fi:au:mts:m4958

So it's inconsistent but at least one out of three was correct. The more worrying issue is with GraphDB lucene plugin failing to parse these URIs. I have observed quite a few entries in the GraphDB error.log like this:
Failed executing lucene query 'http://urn.fi/URN:NBN:fi:au:mts:m4958' on index 'vocbenchLabel' org.apache.lucene.queryParser.ParseException: Cannot parse 'http://urn.fi/URN:NBN:fi:au:mts:m4958': Encountered " ":" ": "" at line 1, column 17.

Out of the three problems, the failure of the Lucene plugin to parse these URIs seems pretty severe. Is it possible for me to disable the plugin in a VB project like this (so that the plugin is still in use with other vocabulary projects in the same VB instance)? Any tips?

_______
Joeli
Message has been deleted

Okko Vainonen

unread,
May 29, 2026, 6:43:40 AMMay 29
to vocbench-user
Hi,

When Joeli wrote these questions back in February, we had problems with URN namepaces ending with colon, because Vocbench forced automatically character # behind base URI. There was a workaround for this and that was removing # from project info file. When we updated Vocbench from version 14 to v15.1.0, this does not work anymore: although # can be removed from the file it will appear in GUI view where it does not pass the last two conditions of isNamespaceValid test (namespace must end with either # or /). See code.

So there is no workaround anymore to my knowledge. This makes all projects with namespace ending with colon and created after updating to version v15.1.0 unusable and we have a few of them. How to solve this? One way would be going back to earlier and allowing namespace to be edited with project info file, but maybe it would be better to edit the isNamespaceValid test slightly and give user a warning with dialog window for allowing namespaces ending with other characters than # or /. Could this be fixed like this?`

Kind regards,
Okko Vainonen

tiziano.lorenzetti

unread,
May 29, 2026, 10:47:42 AMMay 29
to vocbench-user
Dear all,
first of all, apologies for not replying to the previous message from Joeli. The last period was (and still is) intense and, unfortunately, some messages may have slipped through.  

Regarding the issue with namespaces ending with ":" and the automatic addition of "#", we just reviewed the client-side validation logic in both VB and ST in order to avoid blocking IRIs ending with ":" and to avoid forcibly appending "#". 
Therefore, in the upcoming release (v15.1.2, which should be available shortly), the UI should properly allow such namespaces and no longer prevent their usage.  

Concerning the Lucene-related issue, we are not sure we will be able to introduce a fix already in the upcoming release, since it is very close to being finalized.
Moreover, since the error seems to be related to Lucene parsing itself, at the moment we cannot say whether this is something that can be addressed on our side or whether it depends directly on Lucene behavior.
In any case, in order to reproduce it locally, could you please clarify when exactly the ParseException is triggered and which specific operation/action causes it?

Best regards,
Tiziano
Message has been deleted

Okko Vainonen

unread,
Jun 12, 2026, 8:15:02 AM (2 days ago) Jun 12
to vocbench-user
Hi again,

After writing last message I wondered where it disappeared. Now it says that "message was deleted". Only today, I come back here to see that it reappeared here with answer from Tiziano. Just in case it will happen again I save this message if it disappears too. I use "reply to all" option here.

I was delighted to see that the new release should fix this issue with namespaces ending with other characters.

It still seems that lucene is logging errors from queries containing semicolons. For example, querying concept in Vocbench GUI with URI ending m261 is successful, but with trying to query the same concept with prefix "mts:m261" GUI alerts "No results found for 'mts:m261". Under the hood GraphDB logs error:
Failed executing lucene query '(mts:m261)|(*mts:m261*)' on index 'vocbenchLocalName' org.apache.lucene.queryParser.ParseException: Cannot parse '(mts:m261)|(*mts:m261*)': Encountered " ":" ": "" at line 1, column 16.

Also querying with URL starting with http: will cause similar errors as said by Joeli before.

We have "15.1.0 in combo with GraphDB 10.x:" and have done as instructions says and have deployed "sail (from VB3 14.0) instead"

Best regards,
Okko
Reply all
Reply to author
Forward
0 new messages