I'm looking at converting some aspects of an Oracle based application to our
AS400 and I'd like to be able to size it and also do it without too many
changes to the schema..
Rick
The VARCHAR(500) will only occupy the minimum amount needed to hold each value.
It is fair to think of varying length character strings as a way to minimize
storage requirements. If a value is often less than 50 characters, but *might
be* up to 1000 characters, then create it like this:
> create table testtable
(col1 varchar(1000) allocate (50))
This tells the system to store values up to 50 characters in length in the fixed
portion of the table, then any values longer than 50 will be put in the varying
length storage associated with the file. The advantage to this approach is that
you get better performance for values in the fixed portion, but still have the
flexibility of storing up to 100 characters without occupying 1000 bytes per
row. The ALLOCATE clause is optional, if you don't specify it, the system will
choose a default.
... but still have the flexibility of storing up to *1000* characters without
occupying 1000 bytes per row.
How does DB2 internally handle the VARCHAR fields. Does it store them
separately and have a join behind the scenes to present the result set?
Thanks
Jim Mattheiss
Mark Megerian wrote in message <3981B3F7...@us.ibm.com>...
--
Kent Milligan, DB2 & BI team
PartnerWorld for Developers, AS/400
km...@us.removethis.ibm.com GO HAWKEYES!!
(opinions stated are not necessarily those of my employer)
Let me ask the question another way:
From a performance versus DASD utilization perspective, should we be
using CHARACTER(4000) or VARCHAR(4000) fields?
If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
If we use VARCHAR, we have a calculated fixed portion and a auxiliary
portion. If the data is over the "breakpoint" and goes to aux, how
much additional overhead is involved on update/retrieval?
The issue is that we don't want to "tweak" the DB for UDB, but want to
deploy to any of the major players without schema changes. THe model
is in a third party tool, so going to a different DB just involves
setting up ODBC and Forward Engineering the model.
Thanks. Jim Mattheiss
In article <39857D05...@us.ibm.com>,
Sent via Deja.com http://www.deja.com/
Before you buy.
>Let me ask the question another way:
>
>From a performance versus DASD utilization perspective, should we be
>using CHARACTER(4000) or VARCHAR(4000) fields?
>
>If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
>If we use VARCHAR, we have a calculated fixed portion and a auxiliary
>portion. If the data is over the "breakpoint" and goes to aux, how
>much additional overhead is involved on update/retrieval?
>
>The issue is that we don't want to "tweak" the DB for UDB, but want to
>deploy to any of the major players without schema changes. THe model
>is in a third party tool, so going to a different DB just involves
>setting up ODBC and Forward Engineering the model.
>
I would guess that it would be much better to user VARCHAR(4000) if the
"normal" (allocated) field size is much smaller. That way you get more
records per buffer, and the "normal" I/O will work faster.
Njål Fisketjøn, FIGU DATA AS
n...@figu.no
If the column value is referenced/used frequently like the other columns, then
I'd create it as a CHAR(4000) (which is equivalent to VARCHAR(4000)
ALLOCATE(4000).
It mimics what we are manually doing to the model, moving the large
varchar fields to separate tables and accessing them as needed. It
would be easier to see if our modelling package understands the
ALLOCATE concept with regard to VARCHAR fields.
Jim Mattheiss
In article <3985A904...@us.ibm.com>,
jmatt...@scholastic.com wrote:
>
> Thanks for the reply:
>
> Let me ask the question another way:
>
> From a performance versus DASD utilization perspective, should we be
> using CHARACTER(4000) or VARCHAR(4000) fields?
>
> If we use CHAR, we are fixed 4000 bytes but reasonably fast I/O.
> If we use VARCHAR, we have a calculated fixed portion and a auxiliary
> portion. If the data is over the "breakpoint" and goes to aux, how
> much additional overhead is involved on update/retrieval?
>
The overhead varies based on choices you make.
Perhaps the most significant variable is the _distribution_ of
sizes.
If, for instance, 90% or more of the data has a length of
3900-4000 characters, then fixed 4000 is likely to be the correct
choice all around, and probably on any platform (and very
likely correct on OS/400). There's modest performance advantages
gained at saying "fixed" instead of "variable" if you really are
going to use most of the max length data for each record, because (like
almost everyone else) there's added overhead per record for variable
over fixed data.
If, instead, the distribution is uniform between 1 and
4000 bytes, then VARCHAR is likely to be the correct choice,
again, largely independent of platform.
If you wish to do OS/400 specific table creates,
picking the ALLOCATE() value (in SQL terms) is a bit more of
an art, but if you know the distribution of sizes, you
should be able to make size/access trade-offs (e.g. average
size plus 10 or 20 percent). Or, you
can take the default of 0 and have done with it, which
is again platform independent (or at least UDB-wide). This
will mean one extra page per record access, but it
will optimize space usage and give reasonably _consistent_
results over a wide value of data size distributions if
you don't know what they are or if they vary from
instance to instance.
Larry Loen speaking on his own. . .