[topicmapmail] PSI hashing and relational databases

Skip to first unread message

Alexander Johannesen

Jan 25, 2012, 8:02:03 PM1/25/12
to TopicMapMail Mail

Hi all,

I'm in the middle of a big development, and I'm using a relational database to hold a TMDM/TMRM mashup model, and I'm currently dealing with PSIs and how to best deal with them.

My design took an early decision to store internal ID's as Int, and have a separate NAME that was a CHAR(99), both indexed, of course, and then let the framework deal with the overhead of converting between the two.

Next step is to work out best how to store and deal with PSI, gacing a similar challenge and I'm wondering what people here have done. I can store PSIs in a larger char, and create a shorter hashed field which the framework will work with.

Thoughts, even on the sort of hashing I should do (if at all)?


Lars Heuer

Jan 26, 2012, 5:31:06 PM1/26/12
to Alexander Johannesen, TopicMapMail Mail
Hi Alex,

I am a bit surprised that someone is still interested in Topic Maps,
but why not… some want your soul, some want your first born child, some want
Topic Maps; I think we should accept strange desires. ;)

Anyway, I think the current preferred strategy is to canonicalize any input and
to map the c14n to an internal identifier. The internal identifier could be
a hash or an integer or whatever fits. You'll end up with a bunch of singletons,
it doesn't matter if a string is used as PSI or as occurrence IRI, each
string is matched to the same (internal) data structure.

Several (all?) RDF engines use this strategy, and some Topic Maps engines
use it, too. So it depends on your (internal) model if (i.e. a MD5) hash
works for your model (after c14n) or an integer or a string, but singletons
may make sense if you want fast query answering and/or tests for equality.

Best regards,
topicmapmail mailing list

Reply all
Reply to author
0 new messages