Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

db table design question

71 views
Skip to first unread message

ricva

unread,
Jun 1, 2007, 9:47:20 AM6/1/07
to
We have an application which has its settings stored in a database
(SQL Server 2000)

We have (strong) disagrements on the best way to store these values,
and I am looking for advice on doing this.
Most of the values are not used by the stored procedures, just the
applications running on the clients PC. The data rarely changes.


Developer #1 wants many tables with many columns. This gives the
ability to strongly type each variable. Each table would have only
one row. He believes this to be faster in retrieving the data.


Developer #2 wants one table with a Key column and a Value column.
This table would have many rows. The application would be
responsible
for detecting invalid datatype in the value. He believes this to be
a
more efficent use of the database, and easier to maintain.


So - what do you think? Are there any advantages or problems with
either approach?


Thanks
ricva

Anith Sen

unread,
Jun 1, 2007, 10:18:43 AM6/1/07
to
Application settings are best stored outside the database, preferably in an
easy accessible format. In windows applications historically, .INI files and
registry are used, however due to obvious drawbacks the current trend is to
use resource files (a variety of options) and .xml documents.

Databases, traditionally, are considered to be centralized data/integrity
store and intended to serve multiple applications. Therefore, cleaner
designs often tend to leave application specific information outside the
database to avoid unintended bias towards certain applications.

However, in informal environments where data integrity is of lesser concern,
databases are sometimes designed to fit the applications -- note that here a
database is not treated as a set of facts about the business, rather simply
as a storage area for frequently accessed data. In such cases, the
approaches you described in your post may come into play.

Given the only two choices you provided, you should consider the time
variant nature of this list of settings. If the number of settings are
limited and rarely changes you might want to consider #1. However note that
managing this table could be a nightmare once it the number of columns gets
larger. Also, you should consider using a primary key to prevent multiple
rows like:

CREATE TABLE Settings (
key_col INT NOT NULL PRIMARY KEY
CHECK ( key_col = 1 ),
setting_1 INT NOT NULL DEFAULT ( ... ),
setting_2 DATETIME NOT NULL DEFAULT( ... ),
...
setting_n ..) ;

Now, if the list of setting is time variant, you might want to consider the
second approach. The primary drawback is the inability to impose any serious
constraints and the application has to manage most aspects of the data.

--
Anith


ricva

unread,
Jun 1, 2007, 10:48:07 AM6/1/07
to

I should mention that the second method (one table, many rows) has the
Key column as the primary clustered index. There time elements are
constraints which we need, but I did not want to get into that
discussion :-). The second approach is easier for time constraint.

I agree with your points about ini and registry files. However, this
application consists of windows forms on client pc's around the world
connecting to the datatbase. Using the db to return settings makes it
easy for us to manage without having to do a new release. We also
store user configuration information so the user is at the same point
when they reopen their windors app. The database does not have
rights to read files (no xp_cmdshell allowed).


--CELKO--

unread,
Jun 1, 2007, 4:00:40 PM6/1/07
to

Use #1. The other way is called a OTLT ("One True Lookup Table") or
MUCK ("Massively Unified Code Keys") in the literature. It is
incredibly bad design. Here is a cut and paste from one of my books
on Constants tables

04.02. Constants Table

When you configure a system, you might want to have a way to set and
keep constants in the schema. One method for doing this is to have a
one-row table that can be set with default values at the start, and
then updated only by someone with administrative privileges.

CREATE TABLE Constants
(lock CHAR(1) DEFAULT 'X'
NOT NULL PRIMARY KEY
CHECK (lock = 'X'),
pi FLOAT DEFAULT 3.142592653 NOT NULL,
e FLOAT DEFAULT 2.71828182 NOT NULL,
phi FLOAT DEFAULT 1.6180339887 NOT NULL,
..);

To initialize the row, execute this statement.

INSERT INTO Constants VALUES DEFAULTS;

Most SQL programmers do not know about the VALUES DEFAULTS option in
the INSERT INTO statement. The lock column assures there is only one
row and the DEFAULT values load the initial values. These defaults
can include the current user and current timestamp, as well as numeric
and character constant values.

Another version of this idea that does not allow for any updates is a
VIEW defined with a table constructor. [[not in SQL Server yet!]]

CREATE VIEW Constants (pi, e, phi, ..)
AS VALUES (CAST 3.142592653 AS FLOAT),
(CAST 2.71828182 AS FLOAT),
(CAST 1.6180339887 AS FLOAT),
..;

Please notice that you have to use a CAST() operators to assure that
the data types are correct. This is not a problem with INTEGER
values, but can be if you wanted DOUBLE PRECISION and got a default of
DECIMAL(s, p) or FLOAT.


[[ a little bit later in the chapter..]]

OTLT or MUCK Table Problems

I think that Paul Keister was the first person to coin the phrase
"OTLT" (One True Look-up Table) for a common SQL programming technique
that is popular with Newbies. Don Peterson (www.SQLServerCentral.com)
gave the same technique the name "Massively Unified Code-Key" or MUCK
tables in one of his articles.

The technique crops up time and time again, but I'll give him credit
as the first writer to give it a name. Simply put, the idea is to
have one table to do all of the code look-ups in the schema. It
usually looks like this:

CREATE TABLE Look-ups
(code_type CHAR(10) NOT NULL,
code_value VARCHAR(255) NOT NULL, -- notice size!
code_description VARCHAR(255) NOT NULL, -- notice size!
PRIMARY KEY (code_value, code_type));

So if we have Dewey Decimal Classification (library codes), ICD
(International Classification of Diseases), and two-letter ISO-3166
country codes in the schema, we have them all in one, honking big
table.

Let's start with the problems in the DDL and then look at the awful
queries you have to write (or hide in VIEWs). So we need to go back
to the original DDL and add a CHECK() constraint on the code_type
column. Otherwise, we might "invent" a new encoding system by
typographical error.

The Dewey Decimal and ICD codes are digits and have the same format --
three digits, a decimal point and more digits (usually three); the
ISO-3166 is alphabetic. Oops, need another CHECK constraint that will
look at the code_type and make sure that the string is in the right
format. Now the table looks something like this, if anyone attempted
to do it right, which is not usually the case:

CREATE TABLE OTLT
(code_type CHAR(10) NOT NULL
CHECK(code_type IN ('DDC','ICD','ISO3166', ..),
code_value VARCHAR(255) NOT NULL,
CHECK
(CASE
WHEN code_type = 'DDC'
AND code_value
SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]'
THEN 1
WHEN code_type = 'ICD'
AND code_value
SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]'
THEN 1
WHEN code_type = 'ISO3166'
AND code_value SIMILAR TO '[A-Z][A-Z]'
THEN 1 ELSE 0 END = 1),
code_description VARCHAR(255) NOT NULL,
PRIMARY KEY (code_value, code_type));

The "SIMILAR TO" predicate is the SQL-92 version of a regular
expression parser based on the POSIX Standards, if you are not
familiar with it. Since the typical application database can have
dozens and dozens of codes in it, just keep extending this pattern for
as long as required. Not very pretty is it? In fact, there is a good
chance that you will exceed the number of WHEN clauses allowed in a
CASE expression. That's why most OTLT programmers don't bother with
it.

Now let us consider adding new rows to the OTLT.

INSERT INTO OTLT (code_type, code_value, code_description)
VALUES
('ICD', 259.0, 'Inadequate Genitalia after Puberty');

and also

INSERT INTO OTLT (code_type, code_value, code_description)
VALUES
('DDC', 259.0, 'Christian Pastoral Practices & Religious Orders');

If you make an error in the code_type during insert, update or delete,
you have screwed up a totally unrelated value. If you make an error
in the code_type during a query, the results could be interesting.
This can really hard to find when one of the similarly structured
schemes had unused codes in it.

The next thing you notice about this table is that the columns are
pretty wide VARCHAR(n), or even worse, that they are NVARCHAR(n) which
can store characters from a strange language. The value of (n) is
most often the largest one allowed in that particular SQL product.

Since you have no idea what is going to be shoved into the table,
there is no way to predict and design with a safe, reasonable maximum
size. The size constraint has to be put into the WHEN clause of that
second CHECK() constraint between code_type and code_value. Or you
can live with fixed length codes that are longer or shorter than what
they should be.

These large sizes tend to invite bad data. You give someone a
VARCHAR(n) column, and you eventually get a string with a lot of white
space and a small odd character sitting at the end of it. You give
someone an NVARCHAR(255) column and eventually it will get a Buddhist
sutra in Chinese Unicode.

Now let's consider the problems with actually using the OTLT in a
query. It is always necessary to add the code_type as well as the
value which you are trying to look-up.

SELECT P1.ssn, P1.lastname, .., L1.code_description
FROM OTLT AS L1, Personnel AS P1
WHERE L1.code_type = 'ICD'
AND L1.code_value = P1.disease_code
AND ..;

In this sample query, you need to know the code_type of the Personnel
table disease_code column and of every other encoded column in the
table. If you got a code_type wrong, you can still get a result.

You also need to allow for some overhead for data type conversions.
It might be more natural to use numeric values instead of VARCHAR(n)
for some encodings to ensure a proper sorting order. Padding a string
of digits with leading zeros adds overhead and can be risky if
programmers do not agree on how many zeros to use.

When you execute a query, the SQL engine has to pull in the entire
look-up table, even if it only uses a few codes. If one code is at
the start of the physical storage, and another is at the end of
physical storage, I can do a lot of caching and paging. When I update
the OTLT table, I have to lock out everyone until I am finished. It
is like having to carry an encyclopedia set with you when all you
needed was a magazine article.

Now consider the overhead with a two-part FOREIGN KEY in a table:

CREATE TABLE EmployeeAbsences
(..
code_type CHAR(3) -- min length needed
DEFAULT 'ICD' NOT NULL
CHECK (code_type = 'ICD'),

code_value CHAR(7) NOT NULL, -- min length needed
FOREIGN KEY (code_type, code_value)
REFERENCES OTLT (code_type, code_value),
..);

Now I have to convert the character types for more overhead. Even
worse, ICD has a natural DEFAULT value (000.000 means "undiagnosed"),
while Dewey Decimal does not. Older encoding schemes often used all
9's for "miscellaneous" so they would sort to the end of the reports
in COBOL programs. Just as there is no Magical Universal "id", there
is no Magical Universal DEFAULT value. I just lost one of the most
important features of SQL!

I am going to venture a guess that this idea came from OO programmers
who think of it as some kind of polymorphism done in SQL. They say to
themselves that a table is a class, which it isn't, and therefore it
ought to have polymorphic behaviors, which it doesn't.

Maybe there are good reasons for the data modeling principle that a
well-designed table is a set of things of the same kind instead of a
pile of unrelated items.

rpresser

unread,
Jun 1, 2007, 4:22:33 PM6/1/07
to
On Jun 1, 10:48 am, ricva <ric.vander...@gmail.com> wrote:

> I should mention that the second method (one table, many rows) has the
> Key column as the primary clustered index. There time elements are
> constraints which we need, but I did not want to get into that
> discussion :-). The second approach is easier for time constraint.
>
> I agree with your points about ini and registry files. However, this
> application consists of windows forms on client pc's around the world
> connecting to the datatbase. Using the db to return settings makes it
> easy for us to manage without having to do a new release. We also
> store user configuration information so the user is at the same point
> when they reopen their windors app. The database does not have
> rights to read files (no xp_cmdshell allowed).

Well, to avoid the kind of kruft that the second approach can form,
consider that there are other good ways to distribute settings other
than a database ... for instance, a web service, or even just a plain
web page that the client program pulls down at startup time from a web
server.

If you're committed to the database as your distributed store, I offer
one more choice: a single text column containing all your settings in
XML.

David Portas

unread,
Jun 1, 2007, 5:10:21 PM6/1/07
to
On 1 Jun, 14:47, ricva <ric.vander...@gmail.com> wrote:
> We have an application which has its settings stored in a database
> (SQL Server 2000)
>
> We have (strong) disagrements on the best way to store these values,
> and I am looking for advice on doing this.
> Most of the values are not used by the stored procedures, just the
> applications running on the clients PC. The data rarely changes.
>
> Developer #1 wants many tables with many columns. This gives the
> ability to strongly type each variable. Each table would have only
> one row. He believes this to be faster in retrieving the data.
>

Not just faster. Far more importantly it means you can use the correct
datatypes.

What I don't understand is the "many tables" part. Why would you want
more than one such table?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

ricva

unread,
Jun 1, 2007, 6:48:22 PM6/1/07
to
On Jun 1, 3:10 pm, David Portas

the many tables store the different "groups" of data. For example,
table WebInterface would have 8 columns, table FAXServer would have x
number of columns, table SystemDefault would have 40+ columns, etc.


Thanks for the feedback

ricva

unread,
Jun 1, 2007, 7:02:23 PM6/1/07
to
> pile of unrelated items.- Hide quoted text -
>
> - Show quoted text -

Thanks for your input.
As a side question, How much space does the VARCHAR(255) take? Is the
varchar a "pointer" and only points to the value? or does it allocate
space for all 255 chars with the each row?


--CELKO--

unread,
Jun 1, 2007, 10:24:12 PM6/1/07
to
>> As a side question, How much space does the VARCHAR(255) take? Is the varchar a "pointer" and only points to the value? or does it allocate space for all 255 chars with the each row? <<

There is no required implementation in the ANSI/ISO Standards, but
most products will not allocate the entire length.


Erland Sommarskog

unread,
Jun 2, 2007, 5:30:52 AM6/2/07
to
ricva (ric.va...@gmail.com) writes:
> As a side question, How much space does the VARCHAR(255) take? Is the
> varchar a "pointer" and only points to the value? or does it allocate
> space for all 255 chars with the each row?

In the case of SQL Server: Two bytes for the length, and then one byte
for each character in the actual value.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Henrik Staun Poulsen

unread,
Jun 4, 2007, 7:57:48 AM6/4/07
to
ricva,

I live in the real world, and I think I would do a bit of both.

For general stuff, the Attribute-Value table is easy, and flexible, so
that your database structure can support multible versions of the
executable program.

For things like currency codes, or ICD codes I would create separate
tables. They "normalize" nicely.

The approach that Developer #1 takes; it requires carefull planning,
because table restructures needs to be done out side of business
hours, and support for multible versions of the exe is difficult at
best.

Just my 2c worth...
Henrik Staun Poulsen

Erland Sommarskog

unread,
Jun 4, 2007, 6:25:00 PM6/4/07
to
Henrik Staun Poulsen (h...@stovi.com) writes:
> I live in the real world, and I think I would do a bit of both.
>
> For general stuff, the Attribute-Value table is easy, and flexible, so
> that your database structure can support multible versions of the
> executable program.

There are situations where Attribute-Value has a place, but easy?
That depends. If the queries against these attributes are few and
streamline, no problem. But if you need to write queries that access
many attributes, those queries are no fun at all.

> The approach that Developer #1 takes; it requires carefull planning,
> because table restructures needs to be done out side of business
> hours, and support for multible versions of the exe is difficult at
> best.

And "careful planning" is indeed the keyword. Database design is all a
matter of careful planning. It's nothing for people who like to shot from
the hip. A data model needs to be static for the most part. Else it is
not just manageable. In the system I work with we do use the EAV design
in a few isolated and well-defined places. But it's absolutely out the
question to make this the norm in your database design.

0 new messages