-----Original Message-----
From: Dennis [mailto:droes...@comcast.net] Sent: 15 January 2012 23:52
To: info-ing...@kettleriverconsulting.com
Subject: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
The value for varchar colx in two different rows are
'AAA_BBB_12345' and 'AAA BBB 12345'.
Ingres 9.2.1 (a64.lnx/103)NPTL thinks these are the same, but Ingres 10 doesn't.
Ingres 9.2 wont allow this table to be created as btree unique on colx.
-----Original Message-----
From: Martin Bowes Sent: 16 January 2012 09:24
To: info-ing...@kettleriverconsulting.com
Cc: 'Dennis'
Subject: RE: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
HI Dennis
On ordinary databases: createdb bowtest
I get the same answer on both 9.2.0 and 10.0.0 on a64/lnx.
In both cases the rows sort in order:
AAA BBB 12345
AAA_BBB_12345
You possibly used a different collation sequence on the different installations?
Martin Bowes
-----Original Message-----
From: Dennis [mailto:droes...@comcast.net] Sent: 15 January 2012 23:52
To: info-ing...@kettleriverconsulting.com
Subject: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
The value for varchar colx in two different rows are
'AAA_BBB_12345' and 'AAA BBB 12345'.
Ingres 9.2.1 (a64.lnx/103)NPTL thinks these are the same, but Ingres 10 doesn't.
Ingres 9.2 wont allow this table to be created as btree unique on colx.
[mailto:info-ingres-boun...@kettleriverconsulting.com] On Behalf Of
eric.misc...@w-link.net
Sent: 16 January 2012 06:08
To: Ingres and related product discussion forum
Subject: Re: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are
the same?
Ingres 9.2 is evaluating the underscore "_" as the wildcard character.
> The value for varchar colx in two different rows are 'AAA_BBB_12345' > and 'AAA BBB 12345'.
> Ingres 9.2.1 (a64.lnx/103)NPTL thinks these are the same, but Ingres > 10 doesn't.
> Ingres 9.2 wont allow this table to be created as btree unique on
colx.
It depends on the unicode collation sequence in use. The default
unicode collation sequence used by Ingres treats underscore and space
as the same.
You can use the udefault5 collation sequence instead (createdb -
iudefault5 dbname), which distinguishes between these characters (and
some more).
Not sure at the moment whether this was available in the first 9.2 GA
release, but at least available by patch and in 10.0.
By the way, when using a UTF8 installation you will see the problem
with varchar too.
Kristoff
On Jan 16, 11:03 am, Martin Bowes <martin.bo...@ctsu.ox.ac.uk> wrote:
> I've mucked around with this further....
> I can replicate the error(?) if the database is unicode enabled: createdb -i bowtest
> And only then if the sort column is nvarchar not varchar.
> create table x(a nvarchar(100) not null not default);
> insert into x values('AAA BBB 12345');
> insert into x values('AAA_BBB_12345');
> modify x to btree unique on a;
> select * from x;
> The modify will generate the error:
> E_US1591 MODIFY: table could not be modified because rows contain duplicate keys.
> I get this on both 9.2 and on 10.0.
> Marty
> -----Original Message-----
> From: Martin Bowes
> Sent: 16 January 2012 09:24
> To: info-ing...@kettleriverconsulting.com
> Cc: 'Dennis'
> Subject: RE: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
> HI Dennis
> On ordinary databases: createdb bowtest
> I get the same answer on both 9.2.0 and 10.0.0 on a64/lnx.
> In both cases the rows sort in order:
> AAA BBB 12345
> AAA_BBB_12345
> You possibly used a different collation sequence on the different installations?
> Martin Bowes
> -----Original Message-----
> From: Dennis [mailto:droes...@comcast.net]
> Sent: 15 January 2012 23:52
> To: info-ing...@kettleriverconsulting.com
> Subject: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
> The value for varchar colx in two different rows are
> 'AAA_BBB_12345' and 'AAA BBB 12345'.
> Ingres 9.2.1 (a64.lnx/103)NPTL thinks these are the same, but Ingres 10
> doesn't.
> Ingres 9.2 wont allow this table to be created as btree unique on colx.
> It depends on the unicode collation sequence in use. The default
> unicode collation sequence used by Ingres treats underscore and space
> as the same.
> You can use the udefault5 collation sequence instead (createdb -
> iudefault5 dbname), which distinguishes between these characters (and
> some more).
> Not sure at the moment whether this was available in the first 9.2 GA
> release, but at least available by patch and in 10.0.
> By the way, when using a UTF8 installation you will see the problem
> with varchar too.
> Kristoff
> On Jan 16, 11:03 am, Martin Bowes <martin.bo...@ctsu.ox.ac.uk> wrote:
>> Hi All,
>> I've mucked around with this further....
>> I can replicate the error(?) if the database is unicode enabled: createdb -i bowtest
>> And only then if the sort column is nvarchar not varchar.
>> create table x(a nvarchar(100) not null not default);
>> insert into x values('AAA BBB 12345');
>> insert into x values('AAA_BBB_12345');
>> modify x to btree unique on a;
>> select * from x;
>> The modify will generate the error:
>> E_US1591 MODIFY: table could not be modified because rows contain duplicate keys.
>> I get this on both 9.2 and on 10.0.
>> Marty
>> -----Original Message-----
>> From: Martin Bowes
>> Sent: 16 January 2012 09:24
>> To: info-ing...@kettleriverconsulting.com
>> Cc: 'Dennis'
>> Subject: RE: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
>> HI Dennis
>> On ordinary databases: createdb bowtest
>> I get the same answer on both 9.2.0 and 10.0.0 on a64/lnx.
>> In both cases the rows sort in order:
>> AAA BBB 12345
>> AAA_BBB_12345
>> You possibly used a different collation sequence on the different installations?
>> Martin Bowes
>> -----Original Message-----
>> From: Dennis [mailto:droes...@comcast.net]
>> Sent: 15 January 2012 23:52
>> To: info-ing...@kettleriverconsulting.com
>> Subject: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
>> The value for varchar colx in two different rows are
>> 'AAA_BBB_12345' and 'AAA BBB 12345'.
>> Ingres 9.2.1 (a64.lnx/103)NPTL thinks these are the same, but Ingres 10
>> doesn't.
>> Ingres 9.2 wont allow this table to be created as btree unique on colx.
> It depends on the unicode collation sequence in use. The default
> unicode collation sequence used by Ingres treats underscore and space
> as the same.
> You can use the udefault5 collation sequence instead (createdb -
> iudefault5 dbname), which distinguishes between these characters (and
> some more).
It looks like alterdb will let me change the collation sequence, but am I better off to unload, drop, createdb and reload?
This database was created with createdb -i and II_CHARSET is UTF8. The database was created the same way on Ingres 10 as well.
> Not sure at the moment whether this was available in the first 9.2 GA
> release, but at least available by patch and in 10.0.
> By the way, when using a UTF8 installation you will see the problem
> with varchar too.
This confirms unicode is being used here.
'Knowlege Base | How to Determine if Unicode Enabled? - 411686 -
Actian' (http://www.actian.com/kb/article/411686)
This was also the most of discussion topic yesterday.
> It depends on the unicode collation sequence in use. The default
> unicode collation sequence used by Ingres treats underscore and space
> as the same.
> You can use the udefault5 collation sequence instead (createdb -
> iudefault5 dbname), which distinguishes between these characters (and
> some more).
> Not sure at the moment whether this was available in the first 9.2 GA
> release, but at least available by patch and in 10.0.
Is it safe to copy the udefault5 file from an Ingres 10 installation to an Ingres 9.2 installation, or does this require some other patch to use this in Ingres 9.2?
> > It depends on the unicode collation sequence in use. The default
> > unicode collation sequence used by Ingres treats underscore and space
> > as the same.
> > You can use the udefault5 collation sequence instead (createdb -
> > iudefault5 dbname), which distinguishes between these characters (and
> > some more).
> > Not sure at the moment whether this was available in the first 9.2 GA
> > release, but at least available by patch and in 10.0.
> Is it safe to copy the udefault5 file from an Ingres 10 installation to
> an Ingres 9.2 installation, or does this require some other patch to use
> this in Ingres 9.2?
> Dennis
To be safe, copy the file udefault5.uce (in $II_SYSTEM/ingres/files/
collation/) to the 9.2 installation into the same directory and then
compile the collation sequence using:
aducompile udefault5.uce udefault5 -u
On Jan 19, 12:31 am, "James K. Lowden" <jklow...@schemamania.org>
wrote:
> On Mon, 16 Jan 2012 05:16:50 -0800 (PST)
> Kristoff <kristoff.pic...@ingres.com> wrote:
> > The default unicode collation sequence used by Ingres treats
> > underscore and space as the same.
> That strikes me as somewhat ... counterintuitive.
> How is such a collation advantageous? Is there a Unicode
> recommendation that led Ingres^WActian to choose it?
> --jkl
Don't know the advantage of that, but it was the official standard
collation sequence for Unicode 2.1 (http://www.unicode.org/versions/ components-2.1.9.html), fairly outdated now.
Kristoff <kristoff.pic...@ingres.com> wrote:
> On Jan 19, 12:31 am, "James K. Lowden" <jklow...@schemamania.org>
> wrote:
> > On Mon, 16 Jan 2012 05:16:50 -0800 (PST)
> > Kristoff <kristoff.pic...@ingres.com> wrote:
> > > The default unicode collation sequence used by Ingres treats
> > > underscore and space as the same.
> > How is such a collation advantageous?
> Don't know the advantage of that, but it was the official standard
> collation sequence for Unicode 2.1 (http://www.unicode.org/versions/ > components-2.1.9.html), fairly outdated now.
Hi Kristoff,
AIUI the default collation for blank and underscore are not the same.
Maybe I'm misinterpreting what you said or what it says.
$ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt | head
0020 ; [*020A.0020.0002.0020] # SPACE
005F ; [*021B.0020.0002.005F] # LOW LINE
They sure don't *look* the same. They differ on the first level and
the fourth, which the UCA says is "computable".
The '*' indicates a "variable collation elements", which
the standard says "can be either treated as ignorables or not". These
constitute 22% of the Basic Latin page, 58 characters.
Is it the case that Ingres treats all such elements as ignorable?
For those following along at home, the UCA also says sequences of
igorables are all ignored e.g.,
"This is some text"
would sort equal to any of
" This is some text"
"This is some text"
"Thisissometext"
"This__is__some__text"
The whole set of ignorables includes most punctuation, so an
implementation could also fairly treat these as also equivalent:
"This is some text?"
"(This is some text)"
"This is *some* text!"
I'm not saying this is what Ingres is doing. I'm saying this is a
valid interpretation of what I understand to be the controlling
standard. And that nothing in the UCA DUCET standard suggests blanks
and underscores must have the same sorting value.
> Kristoff <kristoff.pic...@ingres.com> wrote:
> > On Jan 19, 12:31 am, "James K. Lowden" <jklow...@schemamania.org>
> > wrote:
> > > On Mon, 16 Jan 2012 05:16:50 -0800 (PST)
> > > Kristoff <kristoff.pic...@ingres.com> wrote:
> > > > The default unicode collation sequence used by Ingres treats
> > > > underscore and space as the same.
> > > How is such a collation advantageous?
> > Don't know the advantage of that, but it was the official standard
> > collation sequence for Unicode 2.1 (http://www.unicode.org/versions/ > > components-2.1.9.html), fairly outdated now.
> Hi Kristoff,
> AIUI the default collation for blank and underscore are not the same.
> Maybe I'm misinterpreting what you said or what it says.
> $ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt | head
> 0020 ; [*020A.0020.0002.0020] # SPACE
> 005F ; [*021B.0020.0002.005F] # LOW LINE
> They sure don't *look* the same. They differ on the first level and
> the fourth, which the UCA says is "computable".
> The '*' indicates a "variable collation elements", which
> the standard says "can be either treated as ignorables or not". These
> constitute 22% of the Basic Latin page, 58 characters.
> Is it the case that Ingres treats all such elements as ignorable?
> For those following along at home, the UCA also says sequences of
> igorables are all ignored e.g.,
> "This is some text"
> would sort equal to any of
> " This is some text"
> "This is some text"
> "Thisissometext"
> "This__is__some__text"
> The whole set of ignorables includes most punctuation, so an
> implementation could also fairly treat these as also equivalent:
> "This is some text?"
> "(This is some text)"
> "This is *some* text!"
> I'm not saying this is what Ingres is doing. I'm saying this is a
> valid interpretation of what I understand to be the controlling
> standard. And that nothing in the UCA DUCET standard suggests blanks
> and underscores must have the same sorting value.
Hi James,
You are correct in your reading of *current* DUCET tables but as Kristoff said 9.2 Ingres is based on an *older* DUCET which did collate space and underscore. If you don't like it, you can change it.
Regards,
Ian
-----Original Message-----
From: info-ingres-boun...@kettleriverconsulting.com [mailto:info-ingres-boun...@kettleriverconsulting.com] On Behalf Of James K. Lowden
Sent: 21 January 2012 21:37
To: info-ing...@kettleriverconsulting.com
Subject: Re: [Info-Ingres] Ingres 9.2 - Why does Ingres think these are the same?
On Thu, 19 Jan 2012 00:30:14 -0800 (PST)
Kristoff <kristoff.pic...@ingres.com> wrote:
> On Jan 19, 12:31 am, "James K. Lowden" <jklow...@schemamania.org>
> wrote:
> > On Mon, 16 Jan 2012 05:16:50 -0800 (PST)
> > Kristoff <kristoff.pic...@ingres.com> wrote:
> > > The default unicode collation sequence used by Ingres treats
> > > underscore and space as the same.
> > How is such a collation advantageous?
> Don't know the advantage of that, but it was the official standard
> collation sequence for Unicode 2.1 (http://www.unicode.org/versions/ > components-2.1.9.html), fairly outdated now.
Hi Kristoff,
AIUI the default collation for blank and underscore are not the same.
Maybe I'm misinterpreting what you said or what it says.
$ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt | head
0020 ; [*020A.0020.0002.0020] # SPACE
005F ; [*021B.0020.0002.005F] # LOW LINE
They sure don't *look* the same. They differ on the first level and
the fourth, which the UCA says is "computable".
The '*' indicates a "variable collation elements", which
the standard says "can be either treated as ignorables or not". These
constitute 22% of the Basic Latin page, 58 characters.
Is it the case that Ingres treats all such elements as ignorable?
For those following along at home, the UCA also says sequences of
igorables are all ignored e.g.,
"This is some text"
would sort equal to any of
" This is some text"
"This is some text"
"Thisissometext"
"This__is__some__text"
The whole set of ignorables includes most punctuation, so an
implementation could also fairly treat these as also equivalent:
"This is some text?"
"(This is some text)"
"This is *some* text!"
I'm not saying this is what Ingres is doing. I'm saying this is a
valid interpretation of what I understand to be the controlling
standard. And that nothing in the UCA DUCET standard suggests blanks
and underscores must have the same sorting value.
Kristoff <kristoff.pic...@ingres.com> wrote:
> On Jan 21, 10:36 pm, "James K. Lowden" <jklow...@schemamania.org>
> wrote:
> > the allkeys file, the Default Unicode Collation Element Table,
> > which says:
> > $ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt | head
> > 0020 ; [*020A.0020.0002.0020] # SPACE
> > 005F ; [*021B.0020.0002.005F] # LOW LINE
...
> > I'm not saying this is what Ingres is doing. I'm saying this is a
> > valid interpretation of what I understand to be the controlling
> > standard.
> So in this old collation sequence the first weight of both chars is
> the same.
Thanks for the clarification, Kristoff. If I may, did the standard
also change how/when the second field comes into play? AIUI
the current standard, each level is a tie-breaker for the the one
before. I won't go into detail now in case it's irrelevant.
To the OP's question, is the Unicode collation user modfiable (without a
compiler) or contemplated for a future release?
> Kristoff <kristoff.pic...@ingres.com> wrote:
> > On Jan 21, 10:36 pm, "James K. Lowden" <jklow...@schemamania.org>
> > wrote:
> > > the allkeys file, the Default Unicode Collation Element Table,
> > > which says:
> > > $ grep -E '^00.+(LOW LINE|SPACE)$' allkeys.txt | head
> > > 0020 ; [*020A.0020.0002.0020] # SPACE
> > > 005F ; [*021B.0020.0002.005F] # LOW LINE
> ...
> > > I'm not saying this is what Ingres is doing. I'm saying this is a
> > > valid interpretation of what I understand to be the controlling
> > > standard.
> > So in this old collation sequence the first weight of both chars is
> > the same.
> Thanks for the clarification, Kristoff. If I may, did the standard
> also change how/when the second field comes into play? AIUI
> the current standard, each level is a tie-breaker for the the one
> before. I won't go into detail now in case it's irrelevant.
> To the OP's question, is the Unicode collation user modfiable (without a
> compiler) or contemplated for a future release?
> Regards,
> --jkl
Yes, the collation can be changed - there is no "true" collation which
would fit into any language, so actually it is expected that user will
customize it.
For Ingres you need to compile the collation sequence, but the
compiler (aducompile) is delivered with the Ingres distribution.
Be careful when creating your own sequence or when using the very
latest available at unicode.org. Currently Ingres does not understand
all the actual unicode features. If you look at the udefault5
collation which is now delivered in addition to the old one, it is not
the original one coming with Unicode 5.0, it only use the weights
defined in that version, but only contains those characters defined in
standard one coming wit Ingres.