Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Strange issue with inserting varchar2()

9 views
Skip to first unread message

pa...@cpan.org

unread,
May 27, 2021, 8:15:04 PM5/27/21
to Bruce Johnson, dbi users
On Thursday 27 May 2021 23:35:32 Bruce Johnson wrote:
> use bytes; $string=substr($orig,0,4000);

Hello! This is really suspicious. See **BOLD** description of 'bytes' module:

**Use of this module for anything other than debugging purposes is strongly discouraged.**

**If you feel that the functions here within might be useful for your application, this possibly indicates a mismatch between your mental model of Perl Unicode and the current reality.**

Peter J. Holzer

unread,
May 30, 2021, 4:45:03 PM5/30/21
to dbi-...@perl.org
On 2021-05-27 23:35:32 +0000, Bruce Johnson wrote:
> The column in question is a varchar2(4000) so I am truncating the
> input to 4000 bytes.
>
> (via use “bytes; $string=substr($orig,0,4000); ” in the section where
> I actually truncate the string.)
>
> When I do the insert I get an ORA-12899: value too large for column
> "AWARD"."PRECEPT_NOMINATIONS"."ONCOLOGY_COMMENTS" (actual: 4054,
> maximum: 4000) error

Is $orig actually UTF-8 encoded at that time? If it only contains
codepoints <= U+00FF, it might be 1 byte/character, but when sent to
Oracle the codepoints between U+0080 and U+00FF will still be encoded as
two bytes.

Something like:

my $bs = encode_utf8($orig);
my $sbs = substr($bs, 0, $len);
my $truncated = decode_utf8($sbs, FB_QUIET);

would be safer.

Also, waht encoding do you use in Oracle? UTF8 isn't really UTF-8: It's
a weird double-encoding which may take up to 6 six bytes for non-BMP
characters. To get real UTF-8, use AL32UTF8.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | h...@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
signature.asc
0 new messages