Marking Informix columns as UTF8

295 views
Skip to first unread message

Daniel Green

unread,
Jan 5, 2012, 3:00:29 PM1/5/12
to Rose::DB::Object
Hello all,

For varchar/character/text columns, our Informix database stores utf-8
encoded data. Informix + DBD have no notion of this, so when we load
wth Rose::DB::Object our utf-8 scalars are not marked as utf8 and end
up getting utf-8 encoded AGAIN later downstream.

We want to apply a systemic change such that all values of
Rose:DB::Object::Metadata::Column::(?:Varchar|Character|Text) columns
are marked as utf8.

Currently our solution is to make our own subclasses of each column
and add a trigger to do the dirty work for us. However, I can't help
but wonder if there is already something baked into Rose::DB::Object
for this. Is there a better way than what we are doing?

I appreciate the help,
Daniel

John Siracusa

unread,
Jan 5, 2012, 4:05:48 PM1/5/12
to rose-db...@googlegroups.com
On Thu, Jan 5, 2012 at 3:00 PM, Daniel Green <octob...@gmail.com> wrote:
> For varchar/character/text columns, our Informix database stores utf-8
> encoded data. Informix + DBD have no notion of this, so when we load
> wth Rose::DB::Object our utf-8 scalars are not marked as utf8 and end
> up getting utf-8 encoded AGAIN later downstream.

Are the strings OK (i.e., in a known, consistent encoding and not
mangled) when you call the column accessor methods on the RDBO-derived
object even without your custom column subclass code??

-John

Daniel Green

unread,
Jan 6, 2012, 1:16:47 PM1/6/12
to Rose::DB::Object
> Are the strings OK (i.e., in a known, consistent encoding and not
> mangled) when you call the column accessor methods on the RDBO-derived
> object even without your custom column subclass code??
>
The strings are consistently encoded/unmangled.

On Jan 5, 4:05 pm, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Jan 6, 2012, 4:26:38 PM1/6/12
to rose-db...@googlegroups.com
On Fri, Jan 6, 2012 at 1:16 PM, Daniel Green <octob...@gmail.com> wrote:
>> Are the strings OK (i.e., in a known, consistent encoding and not
>> mangled) when you call the column accessor methods on the RDBO-derived
>> object even without your custom column subclass code??
>>
> The strings are consistently encoded/unmangled.

OK, so it sounds to me like the problem is with the other code that
processes these strings. Checking to see if Perl's UTF-8 flag is on
is not a good way to check if a string contains UTF-8-encoded data.
If some module is doing that and you can't stop it or otherwise
convince it that what you're feeding it is already a UTF-8-encoded
string, then can you add the extra filtering/mangling step between
that problematic API and the data you're feeding it? IOW, does all
data in every RDBO class need to be marked as UTF-8 as it exits the
column accessor, or does it just need to be done so this one consumer
of this data will behave correctly?

All of that said, I'm not opposed to adding this kind of thing as a
built-in filter on columns. But it sounds to me like you're already
basically doing that, so I'm not sure what advantage you'd get from
having it part of the module. Would that let you drop all your column
subclasses? If so, feel free to submit a patch to add this feature to
the base string column class.

-John

Daniel Green

unread,
Jan 6, 2012, 6:08:21 PM1/6/12
to Rose::DB::Object
> On Fri, Jan 6, 2012 at 1:16 PM, Daniel Green <october...@gmail.com> wrote:
> >> Are the strings OK (i.e., in a known, consistent encoding and not
> >> mangled) when you call the column accessor methods on the RDBO-derived
> >> object even without your custom column subclass code??
>
> > The strings are consistently encoded/unmangled.
>
> OK, so it sounds to me like the problem is with the other code that
> processes these strings.  Checking to see if Perl's UTF-8 flag is on
> is not a good way to check if a string contains UTF-8-encoded data.
> If some module is doing that and you can't stop it or otherwise
> convince it that what you're feeding it is already a UTF-8-encoded
> string, then can you add the extra filtering/mangling step between
> that problematic API and the data you're feeding it?

We can not in fact convince the module that we are consistently
feeding it UTF-8-encoded strings. My thought is that if the strings
are going to be UTF-8, they should be marked as such, regardless of
what API we are using. Given that, we decided RDBO was the best place
to mark the strings as that is our first opportunity to do so.

However, we are having difficulty coming up with a clean way to mark
the strings UTF-8 as they come out of the database.

We tried overriding parse_value/format_value, but discovered it
doesn't always get invoked while loading from the database.

Currently, we are messing around with our own
Rose::DB::Object::MakeMethods::Generic subclass. The thought is to
append to the methods that would be generated by the super class
(RDBO::MakeMethods::Generic). As in, call SUPER->character(), look at
the subroutines returned, create a new subroutine that calls the old
subroutine and marks the value as utf8 somehow. It seems wicked
hackish though. It seems like there should be a cleaner way.

On Jan 6, 4:26 pm, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Jan 6, 2012, 6:49:30 PM1/6/12
to rose-db...@googlegroups.com
On Fri, Jan 6, 2012 at 6:08 PM, Daniel Green <octob...@gmail.com> wrote:
> However, we are having difficulty coming up with a clean way to mark
> the strings UTF-8 as they come out of the database.

How about marking them as they come out of the objects via the column
accessor methods? You could do that with an on_get trigger, and you
could add those column triggers automatically to all text columns at
class construction time.

If you really wanted to mess with the value as it's stored in the
object, you could use on_load triggers.

http://search.cpan.org/dist/Rose-DB-Object/lib/Rose/DB/Object/Metadata/Column.pm#TRIGGERS

-John

Daniel Green

unread,
Jan 9, 2012, 10:40:45 AM1/9/12
to Rose::DB::Object
> > However, we are having difficulty coming up with a clean way to mark
> > the strings UTF-8 as they come out of the database.
>
> How about marking them as they come out of the objects via the column
> accessor methods?  You could do that with an on_get trigger, and you
> could add those column triggers automatically to all text columns at
> class construction time.
>
How would I hook into the construction in a way that wouldn't require
me to manual add the trigger to each instance of the column being
used?

On Jan 6, 6:49 pm, John Siracusa <sirac...@gmail.com> wrote:
> On Fri, Jan 6, 2012 at 6:08 PM, Daniel Green <october...@gmail.com> wrote:
> > However, we are having difficulty coming up with a clean way to mark
> > the strings UTF-8 as they come out of the database.
>
> How about marking them as they come out of the objects via the column
> accessor methods?  You could do that with an on_get trigger, and you
> could add those column triggers automatically to all text columns at
> class construction time.
>
> If you really wanted to mess with the value as it's stored in the
> object, you could use on_load triggers.
>
> http://search.cpan.org/dist/Rose-DB-Object/lib/Rose/DB/Object/Metadat...
>
> -John

On Jan 6, 6:49 pm, John Siracusa <sirac...@gmail.com> wrote:
> On Fri, Jan 6, 2012 at 6:08 PM, Daniel Green <october...@gmail.com> wrote:
> > However, we are having difficulty coming up with a clean way to mark
> > the strings UTF-8 as they come out of the database.
>
> How about marking them as they come out of the objects via the column
> accessor methods?  You could do that with an on_get trigger, and you
> could add those column triggers automatically to all text columns at
> class construction time.
>
> If you really wanted to mess with the value as it's stored in the
> object, you could use on_load triggers.
>
> http://search.cpan.org/dist/Rose-DB-Object/lib/Rose/DB/Object/Metadat...
>
> -John

John Siracusa

unread,
Jan 9, 2012, 10:51:50 AM1/9/12
to rose-db...@googlegroups.com
On Mon, Jan 9, 2012 at 10:40 AM, Daniel Green <octob...@gmail.com> wrote:
>> > However, we are having difficulty coming up with a clean way to mark
>> > the strings UTF-8 as they come out of the database.
>>
>> How about marking them as they come out of the objects via the column
>> accessor methods?  You could do that with an on_get trigger, and you
>> could add those column triggers automatically to all text columns at
>> class construction time.
>>
> How would I hook into the construction in a way that wouldn't require
> me to manual add the trigger to each instance of the column being
> used?

Create your own Rose::DB::Object::Metadata subclass (if you don't have
one already) and make it the meta_class for your common
Rose::DB::Object base class. Then override add_columns() in your
custom Metadata subclass and apply the trigger(s) to the appropriate
column(s). (You should be able to call the SUPER:: method to do all
the work, catching the return value, then just walk over the newly
constructed columns and add your triggers.)

http://search.cpan.org/dist/Rose-DB-Object/lib/Rose/DB/Object.pm#meta_class
http://search.cpan.org/dist/Rose-DB-Object/lib/Rose/DB/Object/Metadata.pm#add_columns

-John

John Siracusa

unread,
Jan 9, 2012, 10:55:53 AM1/9/12
to rose-db...@googlegroups.com
On Mon, Jan 9, 2012 at 10:51 AM, John Siracusa <sira...@gmail.com> wrote:
> Create your own Rose::DB::Object::Metadata subclass (if you don't have
> one already) and make it the meta_class for your common
> Rose::DB::Object base class.  Then override add_columns() in your
> custom Metadata subclass and apply the trigger(s) to the appropriate
> column(s).  (You should be able to call the SUPER:: method to do all
> the work, catching the return value, then just walk over the newly
> constructed columns and add your triggers.)

…of course, once you have your own Metadata subclass, why not just
make your own column subclasses that have these triggers automatically
self-applied, then update the column_type_class() for the relevant
column types (or make new column type name(s)) and then just build
your classes as usual, using whatever type name(s) correspond to your
new UTF-8-ifying column class(es)?

-John

Daniel Green

unread,
Jan 9, 2012, 11:08:58 AM1/9/12
to Rose::DB::Object
> …of course, once you have your own Metadata subclass, why not just
> make your own column subclasses that have these triggers automatically
> self-applied, then update the column_type_class() for the relevant
> column types (or make new column type name(s)) and then just build
> your classes as usual, using whatever type name(s) correspond to your
> new UTF-8-ifying column class(es)?
>
As you described, we currently subclass Metadata along with the
columns we want the new behavior in. I think the outstanding question
is how to apply that trigger within those Column subclasses and have
it apply to all columns everywhere. Would we just override 'new' and
call add_trigger or something like that?


On Jan 9, 10:55 am, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Jan 9, 2012, 11:16:33 AM1/9/12
to rose-db...@googlegroups.com
On Mon, Jan 9, 2012 at 11:08 AM, Daniel Green <octob...@gmail.com> wrote:
> As you described, we currently subclass Metadata along with the
> columns we want the new behavior in. I think the outstanding question
> is how to apply that trigger within those Column subclasses and have
> it apply to all columns everywhere. Would we just override 'new' and
> call add_trigger or something like that?

Overriding init() and calling add_builtin_trigger() to add your
trigger(s) should work.

-John

Daniel Green

unread,
Jan 11, 2012, 4:13:56 PM1/11/12
to Rose::DB::Object
> > As you described, we currently subclass Metadata along with the> > columns we want the new behavior in. I think the outstanding question> > is how to apply that trigger within those Column subclasses and have> > it apply to all columns everywhere. Would we just override 'new' and> > call add_trigger or something like that?> > Overriding init() and calling add_builtin_trigger() to add your> trigger(s) should work.>

This sounds great. I have init calling add_builtin_trigger now.
However, the on_load code is passed the whole object, how do I get
just the column's value? Do I access "$self"'s column name or
something like that and do so dynamically?
On Jan 9, 11:16 am, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Jan 11, 2012, 4:21:48 PM1/11/12
to rose-db...@googlegroups.com
On Wed, Jan 11, 2012 at 4:13 PM, Daniel Green <octob...@gmail.com> wrote:
> This sounds great. I have init calling add_builtin_trigger now.
> However, the on_load code is passed the whole object, how do I get
> just the column's value? Do I access "$self"'s column name or
> something like that and do so dynamically?

Call the column's accessor method. Triggers should be disabled when
you're inside a trigger sub, so there should not be infinite
recursion.

-John

Daniel Green

unread,
Jan 11, 2012, 5:31:56 PM1/11/12
to Rose::DB::Object
> > This sounds great. I have init calling add_builtin_trigger now.> > However, the on_load code is passed the whole object, how do I get> > just the column's value? Do I access "$self"'s column name or> > something like that and do so dynamically?> > Call the column's accessor method.  Triggers should be disabled when> you're inside a trigger sub, so there should not be infinite> recursion.>
Thank you for your patience. I really appreciate the help. I rewrote
the solution we came up with Friday and now have this (currently
working):

sub init {
my $self = shift;

my $result = $self->SUPER::init(@_);

$self->add_builtin_trigger(
'on_load' => sub {
my $object = shift;
my $accessor_name = $self->rw_method_name;

my $value = $object->$accessor_name;

unless (is_utf8($value)) {
$object->$accessor_name(decode('utf8', $value));
}
}
);

return $result;
}



On Jan 11, 4:21 pm, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Jan 11, 2012, 5:53:07 PM1/11/12
to rose-db...@googlegroups.com
As an optimization, you can probably hoist the $accessor_name out of
the trigger sub (assuming you don't plan to change the accessor name
later).

Also, your call to SUPER:: assumes that the init() method is called in
scalar context. This is probably fine, but in general you should
preserve the calling context when trying to transparently wrap a
function. (There are CPAN modules to do this in fewer lines of code
than you'd have to write yourself to do it "manually.")

-John

Daniel Green

unread,
Jan 11, 2012, 6:11:35 PM1/11/12
to Rose::DB::Object
> Also, your call to SUPER:: assumes that the init() method is called in> scalar context.  This is probably fine, but in general you should> preserve the calling context when trying to transparently wrap a> function.  (There are CPAN modules to do this in fewer lines of code> than you'd have to write yourself to do it "manually.")> Thanks! I didn't think of that. Does the return of init currently have any particular meaning? I found nothing in the docs that made that kind of allusion.

Daniel Green

unread,
Jan 11, 2012, 6:16:57 PM1/11/12
to Rose::DB::Object
> As an optimization, you can probably hoist the $accessor_name out of > the trigger sub (assuming you don't plan to change the accessor name > later). $self->rw_method_name returns undef outside of the trigger.

John Siracusa

unread,
Jan 11, 2012, 8:03:31 PM1/11/12
to rose-db...@googlegroups.com
On Wed, Jan 11, 2012 at 6:11 PM, Daniel Green <octob...@gmail.com> wrote:
> Thanks! I didn't think of that. Does the return of init currently have any particular meaning? I found nothing in the docs that made that kind of allusion.

I don't remember; I was making a general statement. But it's never
wrong to do it right :)

> $self->rw_method_name returns undef outside of the trigger.

Ah, maybe it's not set yet at that point. Still, you can hoist the
declaration of $accessor_name and then use ||= inside the trigger sub.

-John

Daniel Green

unread,
Jan 12, 2012, 12:03:55 PM1/12/12
to Rose::DB::Object
> > $self->rw_method_name returns undef outside of the trigger.
>
> Ah, maybe it's not set yet at that point. Still, you can hoist the
> declaration of $accessor_name and then use ||= inside the trigger sub.
>
Done! I think I have reached the end of this path safely. Thanks again
for your help.

My only remaining problem I started a separate thread for as it
appears to be a separate problem.

On Jan 11, 8:03 pm, John Siracusa <sirac...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages