DB2/400 and variable strings, VARCHAR and VARCHAR2.

Rick Price

unread,

Jul 28, 2000, 3:00:00 AM7/28/00

to

Could anybody tell me how DB2/400 handle variable length strings? My
suspicion is that a VARCHAR(500) column will always take up 500+ bytes
regardless of the actual string length, but I'd like to be wrong.

I'm looking at converting some aspects of an Oracle based application to our
AS400 and I'd like to be able to size it and also do it without too many
changes to the schema..

Rick

Mark Megerian

unread,

Jul 28, 2000, 3:00:00 AM7/28/00

to

The good news is that you are wrong :-)

The VARCHAR(500) will only occupy the minimum amount needed to hold each value.
It is fair to think of varying length character strings as a way to minimize
storage requirements. If a value is often less than 50 characters, but *might
be* up to 1000 characters, then create it like this:

> create table testtable
(col1 varchar(1000) allocate (50))

This tells the system to store values up to 50 characters in length in the fixed
portion of the table, then any values longer than 50 will be put in the varying
length storage associated with the file. The advantage to this approach is that
you get better performance for values in the fixed portion, but still have the
flexibility of storing up to 100 characters without occupying 1000 bytes per
row. The ALLOCATE clause is optional, if you don't specify it, the system will
choose a default.

Mark Megerian

unread,

Jul 28, 2000, 3:00:00 AM7/28/00

to

obvious typo:

... but still have the flexibility of storing up to *1000* characters without

occupying 1000 bytes per row.

Jim Mattheiss

unread,

Jul 29, 2000, 3:00:00 AM7/29/00

to

An answer I would like is:

How does DB2 internally handle the VARCHAR fields. Does it store them
separately and have a join behind the scenes to present the result set?

Thanks
Jim Mattheiss

Mark Megerian wrote in message <3981B3F7...@us.ibm.com>...

Kent Milligan

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

No, the data is either in the fixed portion or the auxilary/overflow portion -
it's not split across the two.

--
Kent Milligan, DB2 & BI team
PartnerWorld for Developers, AS/400
km...@us.removethis.ibm.com GO HAWKEYES!!
(opinions stated are not necessarily those of my employer)

jmatt...@scholastic.com

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

Thanks for the reply:

Let me ask the question another way:

From a performance versus DASD utilization perspective, should we be
using CHARACTER(4000) or VARCHAR(4000) fields?

If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
If we use VARCHAR, we have a calculated fixed portion and a auxiliary
portion. If the data is over the "breakpoint" and goes to aux, how
much additional overhead is involved on update/retrieval?

The issue is that we don't want to "tweak" the DB for UDB, but want to
deploy to any of the major players without schema changes. THe model
is in a third party tool, so going to a different DB just involves
setting up ODBC and Forward Engineering the model.

Thanks. Jim Mattheiss

In article <39857D05...@us.ibm.com>,

Sent via Deja.com http://www.deja.com/
Before you buy.

Njål Fisketjøn

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

On Mon, 31 Jul 2000 14:39:06 GMT, jmatt...@scholastic.com wrote:

>Let me ask the question another way:
>
>From a performance versus DASD utilization perspective, should we be
>using CHARACTER(4000) or VARCHAR(4000) fields?
>
>If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
>If we use VARCHAR, we have a calculated fixed portion and a auxiliary
>portion. If the data is over the "breakpoint" and goes to aux, how
>much additional overhead is involved on update/retrieval?
>
>The issue is that we don't want to "tweak" the DB for UDB, but want to
>deploy to any of the major players without schema changes. THe model
>is in a third party tool, so going to a different DB just involves
>setting up ODBC and Forward Engineering the model.
>

I would guess that it would be much better to user VARCHAR(4000) if the
"normal" (allocated) field size is much smaller. That way you get more
records per buffer, and the "normal" I/O will work faster.

Njål Fisketjøn, FIGU DATA AS
n...@figu.no

Kent Milligan

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

Really depends on how often this long character column is going to be
used/referenced. If it's not referenced as often as the other columns in the
row then define it as VARCHAR(4000) ALLOCATE(0) and pay that extra I/O to access
it from the overflow portion only when it's really needed.

If the column value is referenced/used frequently like the other columns, then
I'd create it as a CHAR(4000) (which is equivalent to VARCHAR(4000)
ALLOCATE(4000).

jmatt...@scholastic.com

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

Thanks for all the posts on my questions.

It mimics what we are manually doing to the model, moving the large
varchar fields to separate tables and accessing them as needed. It
would be easier to see if our modelling package understands the
ALLOCATE concept with regard to VARCHAR fields.

Jim Mattheiss

In article <3985A904...@us.ibm.com>,

Larry W Loen

unread,

Jul 31, 2000, 3:00:00 AM7/31/00

to

jmatt...@scholastic.com wrote:
>
> Thanks for the reply:

>
> Let me ask the question another way:
>
> From a performance versus DASD utilization perspective, should we be
> using CHARACTER(4000) or VARCHAR(4000) fields?
>
> If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
> If we use VARCHAR, we have a calculated fixed portion and a auxiliary
> portion. If the data is over the "breakpoint" and goes to aux, how
> much additional overhead is involved on update/retrieval?
>

The overhead varies based on choices you make.

Perhaps the most significant variable is the _distribution_ of
sizes.

If, for instance, 90% or more of the data has a length of
3900-4000 characters, then fixed 4000 is likely to be the correct
choice all around, and probably on any platform (and very
likely correct on OS/400). There's modest performance advantages
gained at saying "fixed" instead of "variable" if you really are
going to use most of the max length data for each record, because (like
almost everyone else) there's added overhead per record for variable
over fixed data.

If, instead, the distribution is uniform between 1 and
4000 bytes, then VARCHAR is likely to be the correct choice,
again, largely independent of platform.

If you wish to do OS/400 specific table creates,
picking the ALLOCATE() value (in SQL terms) is a bit more of
an art, but if you know the distribution of sizes, you
should be able to make size/access trade-offs (e.g. average
size plus 10 or 20 percent). Or, you
can take the default of 0 and have done with it, which
is again platform independent (or at least UDB-wide). This
will mean one extra page per record access, but it
will optimize space usage and give reasonably _consistent_
results over a wide value of data size distributions if
you don't know what they are or if they vary from
instance to instance.

Larry Loen speaking on his own. . .