On 2021-05-27 23:35:32 +0000, Bruce Johnson wrote:
> The column in question is a varchar2(4000) so I am truncating the
> input to 4000 bytes.
>
> (via use “bytes; $string=substr($orig,0,4000); ” in the section where
> I actually truncate the string.)
>
> When I do the insert I get an ORA-12899: value too large for column
> "AWARD"."PRECEPT_NOMINATIONS"."ONCOLOGY_COMMENTS" (actual: 4054,
> maximum: 4000) error
Is $orig actually UTF-8 encoded at that time? If it only contains
codepoints <= U+00FF, it might be 1 byte/character, but when sent to
Oracle the codepoints between U+0080 and U+00FF will still be encoded as
two bytes.
Something like:
my $bs = encode_utf8($orig);
my $sbs = substr($bs, 0, $len);
my $truncated = decode_utf8($sbs, FB_QUIET);
would be safer.
Also, waht encoding do you use in Oracle? UTF8 isn't really UTF-8: It's
a weird double-encoding which may take up to 6 six bytes for non-BMP
characters. To get real UTF-8, use AL32UTF8.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | |
h...@hjp.at | -- Charles Stross, "Creative writing
__/ |
http://www.hjp.at/ | challenge!"