How link records on data longer than 255 characters?

Edward Jahn

unread,

Dec 21, 2009, 6:37:10 PM12/21/09

to ope...@googlegroups.com

I am working with a data item that is sometimes (though not often) longer than 255 characters. I have records in multiple files that are related *only* by having the same contents in this data item. I wanted to use it as a key to each of these files, which would make linking these
records very easy. But I can't, because of the QM system limit on possible key length.

I then made it into a data field, thinking to use alternate key processing -- only to discover that the 255 character limit applies to alternate keys as well as primary keys.

Clearly, I could use a SELECT command, but how much time would it take?

The help says that instances longer than 255 characters will not be included in the alternate key index. In my situation, most of the instances are shorter, but a few (maybe 5%) are longer.
How does a SELECT handle this? Does it use the alternate key index for those instances that are in it, then somehow bypass the index and select just those instances that are longer than 255? Or, does it bypass the alternate key index altogether?

I'm trying to find a way of linking the records without spending huge amounts of time doing so. I am open to any and all suggestions as to how to do it.

Happy holidays,
Ed Jahn
Leesburg, Virginia

Brian Speirs

unread,

Dec 22, 2009, 2:33:28 AM12/22/09

to OpenQM

Hi Ed,

Yes, key values cannot be greater than 255 bytes (although 63 bytes by
default), and alternate key values cannot be greater than 255 bytes.
Those are pretty generous limits. After all, how often do you need to
go beyond this number of characters to get to a unique value?

Your problem is essentially one of transforming the data string in
your field to some unique value. The simplest method for this is
serial allocation, but that doesn't really help you test a new data
field to see if you already have that serialised.

How about using the MD5 function to digest the contents of your data
field to a much more manageable size (32 bytes). This could either be
your key field, or your indexed alternate key.

The thing I find difficult to grasp is that when you come to search
for these items (as keys or otherwise), are you really entering in a
255+ byte string to search for? Or even selecting such an "item-id"
out of a list would be an immensely tedious task. In two strings of
255 bytes, how do you recognise that byte 224 is different?

Cheers,

Brian

Martin Phillips

unread,

Dec 22, 2009, 4:00:01 AM12/22/09

to ope...@googlegroups.com

Hi Ed,

> The help says that instances longer than 255 characters will
> not be included in the alternate key index. In my situation, most
> of the instances are shorter, but a few (maybe 5%) are longer.

Perhaps you could build an index on the first 255 characters of the item and
then filter the results?

Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200

Kevin Powick

unread,

Dec 22, 2009, 9:19:42 AM12/22/09

to OpenQM

On Dec 21, 6:37 pm, Edward Jahn <ejahn3...@yahoo.com> wrote:
> I am working with a data item that is sometimes (though not often) longer than 255 characters. I have records in multiple files that are related *only* by having the same contents in this data item. I wanted to use it as a key to each of these files, which would make linking these
> records very easy.

Just to clarify. You want the key to your item to be the contents of
your item, so that identifying items as the same across multiple files
will be easier?

Example Item:

01 This
02 Is
03 My
04 Item

ID = ThisIsMyItem

As you've discovered, it does not work, but I also believe it is the
wrong approach. I agree with Brian. Use some type of hash value on
the data to create your Item ID.

--
Kevin Powick

Tony Gravagno

unread,

Dec 22, 2009, 1:44:39 PM12/22/09

to Ope...@googlegroups.com

Thoughts on the topic:
nospam.pleaseNebula-RnD.com/blog/tech/mv/2009/12/keys1.html

Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com
remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
http://Twitter.com/TonyGravagno

Ed Jahn

unread,

Jan 2, 2010, 4:19:53 PM1/2/10

to OpenQM

Thanks Martin and others for suggestions :) I finally figured it out
for myself.

I do have to use the full 255+ characters. Yes, it's an unusual
situation.
For an academic linguistic project, I am processing words from a lot
of different languages. Some languages, notably Thai, do not separate
words with spaces. For those languages, I get long strings that are
really sentences, but at this point I don't have a way of parsing out
the words, so I have to work with the whole long string.
The solution is to test the length of the string, and if longer than
the maximum, put it into a sorted list in a control record. To find it
on inquiry, do a LOCATE. The data volumes are such that a single
record is enough. Testing shows it works.