>
>>>I am designing a simple accountability system so all the partners can
>>> have
>>> direct access to it by intranet.
>>>
>>> I was designing the data model, and came up with this:
>>>
>>> CREATE TABLE `moviments` (
>>> `moviment_id` int(20) NOT NULL auto_increment,
>>> `moviment_date` date NOT NULL default '0000-00-00',
>>> `moviment_description` char(200) NOT NULL default '',
>>> `moviment_assignor` char(80) NOT NULL default '',
>>> `moviment_drawee` char(80) NOT NULL default '',
>>> `moviment_amount` int(20) NOT NULL default '0',
>>> PRIMARY KEY (`moviment_id`)
>>> ) ENGINE=MyISAM DEFAULT CHARSET=latin1
>>>
>> One small observation: if your column names are meant to be in English -
>> as
>> suggested by your use of "amount", "drawee", etc. I think you should
> also use an English word where you are currently using "moviment".
> "Moviment" is
>> not an English word. There _is_ an English word "movement" that might be
> what you want but I suspect that "transaction" would be the term most
> English-speakers would use in the table you describe. Your email address
> indicates that you are from Brazil so I'll assume you are a
>> Portuguese-speaker, not a native English-speaker. This suggestion is
> only meant to be a friendly suggestion, not an insult to your English
> which is very very good. :-)
>
> Ups, that´s right, I misspelt movement, you know beeing a partner and also
> being the financial director, technology director and CEO some times can
> be exhaustive. Well, at least we are growing.. (the opposite of my
> weekends, that seams to be getting smaller and smaller). But thanks, I
> also agree transaction is more suitable.
>>> But I am a little bit stuck with this:
>>>
>>> Should I use a column to mark if the moviment has been executed?
>>
>> That depends. You could decide to only add the row to the table if it
> has been executed; then, you would know implicitly that every entry has
> been executed. On the other hand, if you want to show transactions that
> have not
>> yet been executed, a column that indicates whether the transaction has
>> been
>> executed would probably be a good idea.
>>
>> It might be even better to display details about the transaction that
>> refer
>> to its successful execution rather than just showing a yes/no flag.
>> Perhaps
>> you could store the timestamp that shows when the transaction was
>> completed
>> and maybe the identity of the person or program that completed the
> transaction or even a copy of the document generated by the transaction
> (or
>> a link to this document). For instance, if a receipt was issued for this
> transaction, you might want to store an image of the receipt (or a link
> to the image) in the table.
>
> It´s a good idea, but for now, overkill... I mean here is my new design:
>
> CREATE TABLE `earnings` (
> `transaction_id` int(20) NOT NULL auto_increment,
> `transaction_planned_date` date NOT NULL default '0000-00-00',
> `transaction_executed_date` date NOT NULL default '0000-00-00',
> `transaction_description` char(200) NOT NULL default '',
> `transaction_assignor` char(80) NOT NULL default '',
> `transaction_amount` int(20) NOT NULL default '0',
> PRIMARY KEY (`transaction_id`)
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1
>
> CREATE TABLE `earnings` (
> `transaction_id` int(20) NOT NULL auto_increment,
> `transaction_planned_date` date NOT NULL default '0000-00-00',
> `transaction_executed_date` date NOT NULL default '0000-00-00',
> `transaction_description` char(200) NOT NULL default '',
> `transaction_assignor` char(80) NOT NULL default '',
> `transaction_amount` int(20) NOT NULL default '0',
> PRIMARY KEY (`transaction_id`)
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1
>
> I think it is a good model for start...
>
>>> About income and outcome, should I use a column called moviment_type or
> just put a negative value when is an outcome for example?
>>>
>> Either approach should work. It would probably be easier to simply store
> the
>> sign of the transaction with the amount.
>>
>>> Does anyone ever made something like that, any other idea that could
> improve my little system?
>>>
>> I think the question you need to ask yourself is "What do I want to be
>> able
>> to find out about a transaction after it has taken place?". If you
> decide you want to know what currency was involved in the transaction,
> you need to
>> record that in your table. If you decide you want to know the serial
> numbers
>> of the banknotes used in the transaction, you need to record that. And
> so on. It might be wise to talk to someone in the bank and ask them what
> things
>> they need to find out while a transaction is taking place and
> afterwards. You may want to try asking your users (or their managers)
> what sorts of things people ask for but can't get from the old system;
> these are prime candidates for things that you could add to your new
> system.
>
> What I want? Well that my partners stop asking me to send the spreadsheet
> every week... :) They want to take a look the company´s profitability, if
> all payments have been made, if all clients have payed... As I am the one
> who make every transaction I simple would input it into DB (direct) and
> they would access a simple page with a transactions list and graphics of
> the current month as default, so they could choose what period they want.
>
Okay, that's fine but it's a little more general than I had in mind. I was
trying to suggest that you think of specific problems that you need to be
able to solve with the table :-)
>> I'm a little surprised that your amount column is an integer. Does the
> Brazilian currency not have a fractional component? Here in Canada,
> monetary
>> amounts are always decimal numbers, like $123.45, meaning one hundred
> and twenty three dollars and forty five cents. Or is your currency like
> Japanese
>> yen, which have no fractional part?
>
> Yes we have decimal, but, we use the dot for thousand, and the comma for
> decimal values... Is it possible to make MySQL accept this? :)
>
Normally, when you define a number as a decimal number in a database, no
actual decimal symbol is stored in that column. The decimal symbol, whether
it is a dot or a comma, is usually 'virtual': it is not stored. When you
display the number later in a query, there is usually some way that you can
specify the decimal symbol you want and you can often choose what separator
you want to represent 1000. The exact method for specifying your decimal
symbol and thousands separator is usually different for each database and
programming language. (You can often use SQL to format the number the way
you like but you could give this job to the programming language.)
I haven't played with decimal symbols in MySQL - I mostly use DB2 - but I
just took a quick look at the manual to see how you could control the
decimal symbol and thousands separator in MySQL. Unfortunately, I didn't
find it but maybe if you do a more thorough search you will find the right
answer. There may be a built-in function that controls the formatting. Or
maybe you'll have to write your own function for this purpose. Or maybe
choosing the right character set will handle this for you automatically. I'm
really not sure. If you want to do this formatting via Java, I can tell you
how to accomplish it but if you are using Perl or php, or other languages, I
don't know.
--
Rhino
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.2.2/280 - Release Date: 13/03/2006
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=myo...@freebsd.csie.nctu.edu.tw
>
>> > >Should you have a flag for the status "movement complete" ? I would
>> > >say
>> >>yes but instead of a simple checkbox, you could store a date value.
>> >>That
>> >>gives you two pieces of information
>> >>
>> >>a) if the date is null then the movement is not complete.
>> >>b) if the date is NOT null then the movement is complete and you know
> when
>> >>it finished.
>> >
>> > There, Shawn and I disagree :-)
>> >
>> > In case of a very simple database, one could use NULL
>> > as a "flag" or "signal". But in general, I would avoid
>> > storing NULLs.
>> >
>> > Basics of database design: store what you know.
>> >
>> > Given that NULLs basically means the absence of a value
>> > (eg: unknown), you shouldn't be storing NULLs.
>> >
>> Nonsense!!
>
> That's a bold statement ...
>
>> That's simply wrong. A null means "unknown or not applicable" and is a
>
> Ah, so now things become clear, a NULL actually can mean
> two things? So much for clarity then...
>
Come on; that's not fair. "Unknown" and "not applicable" are more like
different senses of the same thing, not two opposite things.
>> perfectly valid value to use in many, many situations. Two classic
> examples:
>
> Let me first state that there's a difference between storing NULLs
> and handling NULLs in your result.
>
Yes, of course there is: so what? You said NULLs should never be _stored_;
that's what I'm responding to. Using NULLs is not very _difficult_ anyway,
just a bit tedious because it is another case to handle.
> So, let's debunk these two classic examples...
>
Please note that I did not say that you MUST use NULLs, just that they were
perfectly valid to use in a design.
>> 1. You have a table containing test scores. Some students get every
> question
>> on a given test wrong. Some students don't write the test at all, perhaps
>> because they were sick. In this scenario, I think the correct thing to
> store
>> is 0 for the students who got every question wrong and NULL for the
> students
>> who didn't write the test.
>
> IMO, the better thing to do in this particular case is to NOT store a test
> result for the students that did not make the test.
>
Okay, that might be acceptable, if it doesn't cause you to lose track of the
student altogether. But if this table was the only one that even recorded
the _existence_ of the student, you'd have a problem; if someone tried to
verify that the student had attended this school, you wouldn't know that
they had. Or in a more probable case, if that was the only test for that
course and the student missed it and then had no row in the table, you might
not have any way of knowing that they took the course! And if they later
wanted to write the exam, having recovered from their illness, your query
might have the effect of keeping them from taking the second exam: your
query would report that they had never taken the first exam so an
adminstrator might refuse to let him/her take the second exam because they
(apparently) had never been scheduled to take the first one.
Now, you could have a second table to record people who were scheduled to
take tests but failed to take them to cover that situation but I think it
would be easier to record all students in one table and then simply store a
null for any test that they fail to take and a zero for every case where a
student got every answer wrong. When you compute the class average, the
avg() function would ensure that the students who got every question wrong
would pull down the class average but that students who failed to write the
test at all would NOT skew the average because the avg() function ignores
nulls.
> So, an example table with test scores:
>
> TEST_SCORES
> StudentID Int,
> TestID Int,
> Score TinyInt Unsigned
>
> Now, students who didn't make the test won't have a record in here.
>
> Perfectly valid design AND you avoid storing NULLs.
>
>> 2. You have a table containing employee records. One of the columns is
>> "termination date". What value do you store for a new employee's
> termination
>> date? Well, if they are a contractor on a fixed length contract, you
>> could
>> calculate the date the contract ends; fair enough. If they are a
>> permanent
>> employee and your area has mandatory retirement, you could calculate the
>> date they turn 65 (or whatever) and use that. But what if they are a
>> permanent employee and you don't have mandatory retirement? I would store
> a
>> NULL to mean "I don't know right now". Then, if and when they gave notice
>> that they were leaving, I would change the termination date from NULL to
>> their last day of work.
>
> Why store a date column if you don't know?
>
> Why not use:
>
> EMPLOYEES
> EmployeeID int,
> StartingDate Date,
> ...
>
> TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
> EmployeeID
> TerminationDate
>
> Once more: perfectly valid design.
>
Yes, that is also a valid design but it means you have to have yet another
table that you could have avoided simply by permitting a null in the
employee table. Hey, if you really want your tables to proliferate like
this, that's up to you.
The key point is that NULLs _do_ work and are a legitimate design decision.
You don't _have_ to use them but they can save you some work and reduce the
number of tables you need. (They can also be a bit more work on the
programming side.)
I took your remarks to mean that NULLs were always a bad idea and were a
symptom of bad design and I strongly disagree with that.
If you are simply saying that you don't like them and prefer to use
different designs to avoid them, then I don't have any problem with that.
> At 7:48 pm -0500 14/3/06, Rhino wrote:
> >>Ah, so now things become clear, a NULL actually can mean
> >>two things? So much for clarity then...
> >
> >Come on; that's not fair. "Unknown" and "not applicable" are more like
different senses of the same thing, not two opposite things.
>
>
> IIRC (but please don't ask for a reference) it has been suggested that
there should be two different types of NULL (value unknown & not applicable)
for just this reason.
>
Yep. "unknown" and "n/a" are two different things and SQL happens
to "support" them by NULLs. This is confusing at times and more
difficult to handle at least...
The problems with NULLs are numerous. Don't say they aren't, cause
they are. In several SQL functions, NULLs make a difference and
they will bite you every now and then. Especially for "new" people,
which is a clear signal they are confusing.
> I suppose what it boils down to is that although the two main reasons one
might use NULL are *not* the same thing, the outcome (the absence of a
value) *is* the same.
>
> So you're both right. Sort of.
> :-)
;-)
Martijn Tonies
Upscene Productions
http://www.upscene.com
Read the literature on how to design databases. What you do
is storing "true propositions". That is, each attribute defines a
certain "true proposition". For example:
Employee "Martijn" has Employee# 14.
You should be able to derive these sentences from every row.
Putting a NULL or N/A in there fails to meet this requirement.
> >> perfectly valid value to use in many, many situations. Two classic
> > examples:
> >
> > Let me first state that there's a difference between storing NULLs
> > and handling NULLs in your result.
> >
> Yes, of course there is: so what? You said NULLs should never be
_stored_;
> that's what I'm responding to. Using NULLs is not very _difficult_ anyway,
> just a bit tedious because it is another case to handle.
There's no point in storing what you don't know.
> > So, let's debunk these two classic examples...
> >
> Please note that I did not say that you MUST use NULLs, just that they
were
> perfectly valid to use in a design.
Depends on who you're asking ...
A few years ago, I was working at a company that had developed
an application with an Oracle database with about 400 tables, nothing
too large.
Plenty of tables had NULLs, not because of missing business data,
but rather used internally in the system.
The number of problems we had with them, I cannot count on my
two hands, nor can I count them on the hands of all other developers
(5) for that project.
Just last week, I spoke to a guy who used to be my collegue there
and we discussed the design a bit (he still does database design and
development for a large company, on Sybase with over 4000
deployed systems) and we both agreed that using NULLs in most
cases as part of the initial design was causing us more problems
then helping us.
> >> 1. You have a table containing test scores. Some students get every
> > question
> >> on a given test wrong. Some students don't write the test at all,
perhaps
> >> because they were sick. In this scenario, I think the correct thing to
> > store
> >> is 0 for the students who got every question wrong and NULL for the
> > students
> >> who didn't write the test.
> >
> > IMO, the better thing to do in this particular case is to NOT store a
test
> > result for the students that did not make the test.
> >
> Okay, that might be acceptable, if it doesn't cause you to lose track of
the
> student altogether. But if this table was the only one that even recorded
> the _existence_ of the student, you'd have a problem;
Indeed, then I would have a problem, cause you cannot derive from
the "test results" table that a student exists. This is a rather silly
statement.
If the "test results" table should ALSO store the existence of a student,
you're design is wrong :-)
>if someone tried to
> verify that the student had attended this school, you wouldn't know that
> they had.
In your own words: nonsense.
Each table should store what is was designed to store. If I would
have to know if a student was enlisted in a certain course or would
be attending this school, I would not be using the "test results" table.
>Or in a more probable case, if that was the only test for that
> course and the student missed it and then had no row in the table, you
might
> not have any way of knowing that they took the course!
See above.
>And if they later
> wanted to write the exam, having recovered from their illness, your query
> might have the effect of keeping them from taking the second exam: your
> query would report that they had never taken the first exam so an
> adminstrator might refuse to let him/her take the second exam because they
> (apparently) had never been scheduled to take the first one.
Different problem, see above.
> Now, you could have a second table to record people who were scheduled to
> take tests but failed to take them to cover that situation but I think it
Sounds like a decent design to me.
> would be easier to record all students in one table and then simply store
a
> null for any test that they fail to take and a zero for every case where a
> student got every answer wrong. When you compute the class average, the
> avg() function would ensure that the students who got every question wrong
> would pull down the class average but that students who failed to write
the
> test at all would NOT skew the average because the avg() function ignores
> nulls.
Ah, an excellent example of why NULLs are tedious: they are ignored
by some functions, but not by others.
Without having the record there in the first place, there would have
to be no rule of AVG ignoring NULLs. Problem solved.
> > Why store a date column if you don't know?
> >
> > Why not use:
> >
> > EMPLOYEES
> > EmployeeID int,
> > StartingDate Date,
> > ...
> >
> > TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
> > EmployeeID
> > TerminationDate
> >
> > Once more: perfectly valid design.
> >
> Yes, that is also a valid design but it means you have to have yet another
> table that you could have avoided simply by permitting a null in the
> employee table. Hey, if you really want your tables to proliferate like
> this, that's up to you.
It seems you have a fear of creating new tables ;-)
This is what database systems are designed to do...
> The key point is that NULLs _do_ work and are a legitimate design
decision.
> You don't _have_ to use them but they can save you some work and reduce
the
> number of tables you need. (They can also be a bit more work on the
> programming side.)
>
> I took your remarks to mean that NULLs were always a bad idea and were a
> symptom of bad design and I strongly disagree with that.
So I've noticed.
The message I'm trying to get across is that:
1 - according to (proper) design literature, you should not use NULLs
if you don't have to
2 - NULLs can cause you more problems than you can think of
3 - its sometimes much easier to avoid storing NULLs AND to be
able to refactor your database because of it
4 - the meaning of NULL can change, so why store it in the first place
> If you are simply saying that you don't like them and prefer to use
> different designs to avoid them, then I don't have any problem with that.
I have seen that when I avoid storing NULLs, my applications
became more clear and easier to understand.
Martijn Tonies
Database Workbench - development tool for MySQL, and more!
>> >> > Basics of database design: store what you know.
>> >> >
>> >> > Given that NULLs basically means the absence of a value
>> >> > (eg: unknown), you shouldn't be storing NULLs.
>> >> >
>> >> Nonsense!!
>> >
>> > That's a bold statement ...
>> >
>> >> That's simply wrong. A null means "unknown or not applicable" and is a
>> >
>> > Ah, so now things become clear, a NULL actually can mean
>> > two things? So much for clarity then...
>> >
>> Come on; that's not fair. "Unknown" and "not applicable" are more like
>> different senses of the same thing, not two opposite things.
>
> Read the literature on how to design databases. What you do
> is storing "true propositions". That is, each attribute defines a
> certain "true proposition". For example:
>
> Employee "Martijn" has Employee# 14.
>
> You should be able to derive these sentences from every row.
> Putting a NULL or N/A in there fails to meet this requirement.
>
Perhaps my theory is a bit rusty but I have never heard of this "true
propositions" business before. Do you have a citation where I can read about
this?
Frankly, I'm skeptical about your interpretation of this idea. While you
clearly don't want false information in a database, it isn't false to say
that student such-and-such's exam mark or such-and-such an employee's
termination date is unknown or not applicable. It _would_ be false to store
a grade of 0 for a student who didn't take the exam because it implies that
he got every question wrong when in fact he didn't write the test at all;
that would be an example of 0 having two contradicatory meanings. Given a
phobia about nulls, it is a reasonable design to put students who don't take
the test in a separate table but it's not the ONLY way to handle the
situation: a null to indicate a student who didn't take the test is
perfectly reasonable too.
>> >> perfectly valid value to use in many, many situations. Two classic
>> > examples:
>> >
>> > Let me first state that there's a difference between storing NULLs
>> > and handling NULLs in your result.
>> >
>> Yes, of course there is: so what? You said NULLs should never be
> _stored_;
>> that's what I'm responding to. Using NULLs is not very _difficult_
>> anyway,
>> just a bit tedious because it is another case to handle.
>
> There's no point in storing what you don't know.
>
But you _do_ know something: that the student didn't take the exam. That is
a piece of true knowledge all by itself.
>> > So, let's debunk these two classic examples...
>> >
>> Please note that I did not say that you MUST use NULLs, just that they
> were
>> perfectly valid to use in a design.
>
> Depends on who you're asking ...
>
> A few years ago, I was working at a company that had developed
> an application with an Oracle database with about 400 tables, nothing
> too large.
>
> Plenty of tables had NULLs, not because of missing business data,
> but rather used internally in the system.
>
I'm not sure what you mean by that phrase "used internally in the system".
> The number of problems we had with them, I cannot count on my
> two hands, nor can I count them on the hands of all other developers
> (5) for that project.
>
How about enumerating a few of them here? I've dealt with relational
databases in several capacities for over 20 years and I've never found nulls
to be a big problem.
> Just last week, I spoke to a guy who used to be my collegue there
> and we discussed the design a bit (he still does database design and
> development for a large company, on Sybase with over 4000
> deployed systems) and we both agreed that using NULLs in most
> cases as part of the initial design was causing us more problems
> then helping us.
>
Again, how about listing some of these problems?
Look, I agree that this scenario is not particularly likely to happen but it
COULD happen. Suppose you run a small company that runs certification exams:
your only contact with the students is that they show up to take an exam and
you record the mark and send it to whoever awards the certification. In that
case, you likely wouldn't have an elaborate set of tables containing a
variety of student information the way you would if you were a university.
In that case, it's quite reasonable to believe that the only table in the
system is the one that records the test marks. That table may include the
student name, address, test name, score, etc. etc. That table may only be
populated as the student comes in the door: the 'greeter' welcomes the
student to the test center, asks his name and other details, points him to
the exam room, and records the test result when the student hands in the
exam. In that scenario, if student Joe Blow does not show up for the test
because his car broke down, he may never be entered in the system at all and
you have therefore lost sight of the fact that he ever existed. Again, this
is not a very likely scenario but it is not an impossible one and that's the
key point: if it COULD happen, you will have a problem.
>>Or in a more probable case, if that was the only test for that
>> course and the student missed it and then had no row in the table, you
> might
>> not have any way of knowing that they took the course!
>
> See above.
>
Dismissing the less probable case does not dismiss this more probable
scenario.
>>And if they later
>> wanted to write the exam, having recovered from their illness, your query
>> might have the effect of keeping them from taking the second exam: your
>> query would report that they had never taken the first exam so an
>> adminstrator might refuse to let him/her take the second exam because
>> they
>> (apparently) had never been scheduled to take the first one.
>
> Different problem, see above.
>
Again, you can't dismiss the second scenario by dismissing the first one;
the scenarios are different and the second one is more probable than the
first.
>> Now, you could have a second table to record people who were scheduled to
>> take tests but failed to take them to cover that situation but I think it
>
> Sounds like a decent design to me.
>
Not surprisingly since it is the one you suggested as the "correct" way to
handle the situation. But the design that uses nulls is NOT invalid and you
have not demonstrated that it is.
>> would be easier to record all students in one table and then simply store
> a
>> null for any test that they fail to take and a zero for every case where
>> a
>> student got every answer wrong. When you compute the class average, the
>> avg() function would ensure that the students who got every question
>> wrong
>> would pull down the class average but that students who failed to write
> the
>> test at all would NOT skew the average because the avg() function ignores
>> nulls.
>
> Ah, an excellent example of why NULLs are tedious: they are ignored
> by some functions, but not by others.
>
_ALL_ of the column functions ignore nulls: it is NOT inconsistent. In fact,
it is the only logical thing that these functions can do, which you'd
realize if you gave it a bit of open-minded thought.
Suppose you had a table with a primary key of employee number and a column
containing bonuses. Some of the rows contain nulls in the bonus column
because the employee didn't earn a bonus yet. Now, you need to answer these
questions: what is the largest bonus amount? What is the smallest bonus
amount? What is the sum of the bonuses? What is the average bonus? Since a
null, by definition, is unknown, it isn't bigger or smaller or equal to any
other value, including another null so it can't qualify as the biggest value
or the smallest value. A number plus (or minus, or times, or divided by) a
null is undefined so it makes no sense to add it to the sum of the bonuses.
A null is NOT a zero so treating it as a zero in an average would unduly
skew the result. So, in each case, the function does the logical thing and
IGNORES the null.
> Without having the record there in the first place, there would have
> to be no rule of AVG ignoring NULLs. Problem solved.
>
That is the other way of the solving the problem. But it is NOT the ONLY
way.
>> > Why store a date column if you don't know?
>> >
>> > Why not use:
>> >
>> > EMPLOYEES
>> > EmployeeID int,
>> > StartingDate Date,
>> > ...
>> >
>> > TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
>> > EmployeeID
>> > TerminationDate
>> >
>> > Once more: perfectly valid design.
>> >
>> Yes, that is also a valid design but it means you have to have yet
>> another
>> table that you could have avoided simply by permitting a null in the
>> employee table. Hey, if you really want your tables to proliferate like
>> this, that's up to you.
>
> It seems you have a fear of creating new tables ;-)
>
No, I don't. It seems to me that you have an undue fear of nulls.
> This is what database systems are designed to do...
>
Database systems are designed to accomodate large quantities of tables but
any designer with any real world experience will advise you not to create
tables simply for the sake of creating tables: that is an unnecessary
proliferation of tables. Look at normalization. As you probably know, each
phase of normalization tends to increase the number of tables in the
database. I know for a fact that the theorists have identified at least up
to 17NF (17th Normal Form) but how many levels of normalization do
businesses typically use in their normalizations? Answer: Three.
(Occasionally four). And I think you'll find that the main reason is that
the benefits of normalizing beyond 3NF are outweighed by the increasing
number of tables to manage.
>> The key point is that NULLs _do_ work and are a legitimate design
> decision.
>> You don't _have_ to use them but they can save you some work and reduce
> the
>> number of tables you need. (They can also be a bit more work on the
>> programming side.)
>>
>> I took your remarks to mean that NULLs were always a bad idea and were a
>> symptom of bad design and I strongly disagree with that.
>
> So I've noticed.
>
I'm trying to be gracious here and accept that I may just have misunderstood
the emphasis in what you are saying.
> The message I'm trying to get across is that:
> 1 - according to (proper) design literature, you should not use NULLs
> if you don't have to
Perhaps you can cite some of this "proper" literature. I'm interested in
seeing their definition of what an appropriate time to use nulls is.
> 2 - NULLs can cause you more problems than you can think of
I'd like to hear what these problems are. I've never had any big problems
because of nulls.
> 3 - its sometimes much easier to avoid storing NULLs AND to be
> able to refactor your database because of it
I agree that nulls can frequently be avoided and that avoiding them will
simplify _some_ situations.
> 4 - the meaning of NULL can change, so why store it in the first place
>
Simple: because it happens to be true when it is stored. In the case of the
hypothetical employee, I store a null termination date when I hire him
because I don't know when he is going to leave. If he laters gives his
notice, then I know when he is leaving and can store that date for his
termination date instead of a null; then his row of the table is true again,
based on the new facts.
>> If you are simply saying that you don't like them and prefer to use
>> different designs to avoid them, then I don't have any problem with that.
>
> I have seen that when I avoid storing NULLs, my applications
> became more clear and easier to understand.
>
Beauty is in the eye of the beholder as we all know. I have no problem with
having nulls in my tables and consider that a better design than a separate
table for special cases most of the time.
Again, if you are saying that you don't like nulls and prefer to avoid them
in your designs, I have no problem with that; that's just your personal
preference. It's the same as if we sat down to eat a meal and you asked for
chocolate ice cream for dessert and I had vanilla; neither choice is
"wrong", they are just personal preference.
But if you are stating categorically that nulls are always "bad" or "wrong",
I disagree strongly. That's like saying only chocolate ice cream is
acceptable and that all other flavours are evil. That's just wrong.
--
Rhino
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.2.3/281 - Release Date: 14/03/2006
(please read all before replying)
> >> >> > Basics of database design: store what you know.
> >> >> >
> >> >> > Given that NULLs basically means the absence of a value
> >> >> > (eg: unknown), you shouldn't be storing NULLs.
> >> >> >
> >> >> Nonsense!!
> >> >
> >> > That's a bold statement ...
> >> >
> >> >> That's simply wrong. A null means "unknown or not applicable" and is
a
> >> >
> >> > Ah, so now things become clear, a NULL actually can mean
> >> > two things? So much for clarity then...
> >> >
> >> Come on; that's not fair. "Unknown" and "not applicable" are more like
> >> different senses of the same thing, not two opposite things.
> >
> > Read the literature on how to design databases. What you do
> > is storing "true propositions". That is, each attribute defines a
> > certain "true proposition". For example:
> >
> > Employee "Martijn" has Employee# 14.
> >
> > You should be able to derive these sentences from every row.
> > Putting a NULL or N/A in there fails to meet this requirement.
> >
> Perhaps my theory is a bit rusty but I have never heard of this "true
> propositions" business before. Do you have a citation where I can read
about
> this?
I would certainly recommend this one:
http://www.amazon.com/exec/obidos/ASIN/0321197844/databasede095-20/103-0880452-0854264?creative=327641&camp=14573&adid=0VQ1KDVJ56MV7389RPEQ&link_code=as1
> Frankly, I'm skeptical about your interpretation of this idea. While you
> clearly don't want false information in a database, it isn't false to say
> that student such-and-such's exam mark or such-and-such an employee's
> termination date is unknown or not applicable. It _would_ be false to
store
> a grade of 0 for a student who didn't take the exam because it implies
that
> he got every question wrong when in fact he didn't write the test at all;
Agreed.
> that would be an example of 0 having two contradicatory meanings. Given a
> phobia about nulls, it is a reasonable design to put students who don't
take
> the test in a separate table but it's not the ONLY way to handle the
> situation: a null to indicate a student who didn't take the test is
> perfectly reasonable too.
Disagreed. Let me explain further:
In this case, a "null" means (according to you and in your design)
that the student did not take the test.
It could also mean that we lots it's grade for now, and that we
want to fill it in later, as in: "unknown", which is what NULL also
means (your own words).
In another case, null would be the employee isn't terminated yet,
but could be in the future.
That's the problems with nulls -> there's no actual defined
meaning. You can never say:
"This table holds test results, a row will mean:
Student with StudentID 123 made test #12 and scored a 4"
Cause you can have rows that can mean this:
Student with StudentID 123 made test #12 and scored a
yet unknown score
or
Student with StudentID 123 made test #12 and did not make
the test
or
Student with StudentID 123 and test #12 doesn't apply to him
In short: NULL is ambigious.
> >> >> perfectly valid value to use in many, many situations. Two classic
> >> > examples:
> >> >
> >> > Let me first state that there's a difference between storing NULLs
> >> > and handling NULLs in your result.
> >> >
> >> Yes, of course there is: so what? You said NULLs should never be
> > _stored_;
> >> that's what I'm responding to. Using NULLs is not very _difficult_
> >> anyway,
> >> just a bit tedious because it is another case to handle.
> >
> > There's no point in storing what you don't know.
> >
> But you _do_ know something: that the student didn't take the exam. That
is
> a piece of true knowledge all by itself.
But that's not what the rows in the table would mean - see above.
In the relational model, the idea is to store true propositions - that is,
each row means the same thing, but with different attributes. See above.
> >> > So, let's debunk these two classic examples...
> >> >
> >> Please note that I did not say that you MUST use NULLs, just that they
> > were
> >> perfectly valid to use in a design.
> >
> > Depends on who you're asking ...
> >
> > A few years ago, I was working at a company that had developed
> > an application with an Oracle database with about 400 tables, nothing
> > too large.
> >
> > Plenty of tables had NULLs, not because of missing business data,
> > but rather used internally in the system.
> >
> I'm not sure what you mean by that phrase "used internally in the system".
Missing business data, eg: no middle name. Some people would
put in a NULL in there as well, while NULL can also mean: I do
not KNOW this persons middle name. Ambiguity, see above.
> > The number of problems we had with them, I cannot count on my
> > two hands, nor can I count them on the hands of all other developers
> > (5) for that project.
> >
> How about enumerating a few of them here? I've dealt with relational
> databases in several capacities for over 20 years and I've never found
nulls
> to be a big problem.
Not big, but a problem... :-)
Do you often go around looking for (little) problems while you
can easily avoid them?
> > Just last week, I spoke to a guy who used to be my collegue there
> > and we discussed the design a bit (he still does database design and
> > development for a large company, on Sybase with over 4000
> > deployed systems) and we both agreed that using NULLs in most
> > cases as part of the initial design was causing us more problems
> > then helping us.
> >
> Again, how about listing some of these problems?
Well, for one, we had a table that stored all kinds of different
actions that should take place. Each action could be part of
a list of actions.
Someone decided that actions with a NULL for the "belongs
to list of actions #" would be an action "by itself".
Business wise, one could create an order for a product and that
product had to be created by a list of certain actions. The order
creator would then use any pre-defined list of actions, or create
such a list himself by selecting the different actions.
Asking for stand alone actions would then (obviously simplified):
select * from bewerking where list is null
This is a slow query. Period. One that could have easily been
avoided by not storing NULLs, but using a separate table.
Why a table? Because "standalone actions" do not equal
"actions that belong to a list".
> >> Okay, that might be acceptable, if it doesn't cause you to lose track
of
> > the
> >> student altogether. But if this table was the only one that even
recorded
> >> the _existence_ of the student, you'd have a problem;
> >
> > Indeed, then I would have a problem, cause you cannot derive from
> > the "test results" table that a student exists. This is a rather silly
> > statement.
> > If the "test results" table should ALSO store the existence of a
student,
> > you're design is wrong :-)
> >
> >>if someone tried to
> >> verify that the student had attended this school, you wouldn't know
that
> >> they had.
> >
> > In your own words: nonsense.
> >
> > Each table should store what is was designed to store. If I would
> > have to know if a student was enlisted in a certain course or would
> > be attending this school, I would not be using the "test results" table.
> >
> Look, I agree that this scenario is not particularly likely to happen but
it
> COULD happen.
It's not me who made up the example :-)
> Suppose you run a small company that runs certification exams:
> your only contact with the students is that they show up to take an exam
and
> you record the mark and send it to whoever awards the certification. In
that
> case, you likely wouldn't have an elaborate set of tables containing a
> variety of student information the way you would if you were a university.
> In that case, it's quite reasonable to believe that the only table in the
> system is the one that records the test marks. That table may include the
> student name, address, test name, score, etc. etc. That table may only be
> populated as the student comes in the door: the 'greeter' welcomes the
> student to the test center, asks his name and other details, points him to
> the exam room, and records the test result when the student hands in the
> exam. In that scenario, if student Joe Blow does not show up for the test
> because his car broke down, he may never be entered in the system at all
and
> you have therefore lost sight of the fact that he ever existed. Again,
this
> is not a very likely scenario but it is not an impossible one and that's
the
> key point: if it COULD happen, you will have a problem.
See above. There's no point in taking exams from random people.
You are trying to come up with all sorts of unlikely scenarios to
justify a solution that's, IMO, faulty.
> >>Or in a more probable case, if that was the only test for that
> >> course and the student missed it and then had no row in the table, you
> > might
> >> not have any way of knowing that they took the course!
> >
> > See above.
> >
> Dismissing the less probable case does not dismiss this more probable
> scenario.
Wrong problem description for task at hand. See above (my previous
reply, not this one). Attack each problem with the right solution, don't
try to find problems in solutions that aren't meant to solve the problem
you just found.
> >>And if they later
> >> wanted to write the exam, having recovered from their illness, your
query
> >> might have the effect of keeping them from taking the second exam: your
> >> query would report that they had never taken the first exam so an
> >> adminstrator might refuse to let him/her take the second exam because
> >> they
> >> (apparently) had never been scheduled to take the first one.
> >
> > Different problem, see above.
> >
> Again, you can't dismiss the second scenario by dismissing the first one;
> the scenarios are different and the second one is more probable than the
> first.
See above. The "Different problem" remark still holds.
> >> Now, you could have a second table to record people who were scheduled
to
> >> take tests but failed to take them to cover that situation but I think
it
> >
> > Sounds like a decent design to me.
> >
> Not surprisingly since it is the one you suggested as the "correct" way to
> handle the situation. But the design that uses nulls is NOT invalid and
you
> have not demonstrated that it is.
What I DID demonstrate, however, is that it's _very_ easy to
create a design without NULLs, that is much clearer to read,
doesn't have to handle NULLs, avoids NULLs in general and
make it easy to understand what problems are solved.
Considering maintainability, ability to refactor and without having
to think up missing data, I prefer my design.
Do note I don't claim to have the key to every problem, but I
would like to know people that design WITH NULLs doesn't
have to be the standard although SQL provides them.
> >> would be easier to record all students in one table and then simply
store
> > a
> >> null for any test that they fail to take and a zero for every case
where
> >> a
> >> student got every answer wrong. When you compute the class average, the
> >> avg() function would ensure that the students who got every question
> >> wrong
> >> would pull down the class average but that students who failed to write
> > the
> >> test at all would NOT skew the average because the avg() function
ignores
> >> nulls.
> >
> > Ah, an excellent example of why NULLs are tedious: they are ignored
> > by some functions, but not by others.
> >
> _ALL_ of the column functions ignore nulls: it is NOT inconsistent. In
fact,
> it is the only logical thing that these functions can do, which you'd
> realize if you gave it a bit of open-minded thought.
Don't accuse me of not being open minded. I did develop systems
with NULLs before. I became a database developer back then
from practicing. However, I'm beginning to understand the theory
more and more and dislike NULLs more and more because of it.
See above.
> > The message I'm trying to get across is that:
> > 1 - according to (proper) design literature, you should not use NULLs
> > if you don't have to
>
> Perhaps you can cite some of this "proper" literature. I'm interested in
> seeing their definition of what an appropriate time to use nulls is.
>
> > 2 - NULLs can cause you more problems than you can think of
>
> I'd like to hear what these problems are. I've never had any big problems
> because of nulls.
Ah well, SQL (these days) provides plenty of functions to avoid
the problems. Coalesce being one of them...
How often do people write:
(probably not valid MySQL)
select firstname || coalesce(middlename || ' ', ' ') || lastname
from ...
> > 3 - its sometimes much easier to avoid storing NULLs AND to be
> > able to refactor your database because of it
>
> I agree that nulls can frequently be avoided and that avoiding them will
> simplify _some_ situations.
>
> > 4 - the meaning of NULL can change, so why store it in the first place
> >
> Simple: because it happens to be true when it is stored. In the case of
the
> hypothetical employee, I store a null termination date when I hire him
> because I don't know when he is going to leave.
Why store something that you -don't- know.
> If he laters gives his
> notice, then I know when he is leaving and can store that date for his
> termination date instead of a null; then his row of the table is true
again,
> based on the new facts.
>
> >> If you are simply saying that you don't like them and prefer to use
> >> different designs to avoid them, then I don't have any problem with
that.
> >
> > I have seen that when I avoid storing NULLs, my applications
> > became more clear and easier to understand.
> >
> Beauty is in the eye of the beholder as we all know. I have no problem
with
> having nulls in my tables and consider that a better design than a
separate
> table for special cases most of the time.
>
> Again, if you are saying that you don't like nulls and prefer to avoid
them
> in your designs, I have no problem with that; that's just your personal
> preference. It's the same as if we sat down to eat a meal and you asked
for
> chocolate ice cream for dessert and I had vanilla; neither choice is
> "wrong", they are just personal preference.
Well, to that we agree and I would order sugared whipped cream with it ;-)
> But if you are stating categorically that nulls are always "bad" or
"wrong",
> I disagree strongly. That's like saying only chocolate ice cream is
> acceptable and that all other flavours are evil. That's just wrong.
We are not going to agree, that much is clear.
I think we raised some very interesting points though (which is exactly
why I started this in the first place).
I would avoid NULLs if I could. It makes - very often - more sense
to avoid them than to include them.
You say that it's not "bad" to have NULLs every now and then, a
statement to which I can agree up to a certain height.
I would also like to state, that I've seen design where NULLs is
grossly overused for all sorts of situations. Your simple examples
can be among those. IF you're using NULLs, you'd better be aware
of what you're doing. Just don't use NULL for the sole purpose of
avoiding a table or relation.
I'd still say that in general, the "true proposition" remark holds and
is very very valid. With that as the basis, design your database.
I think I've also described why using NULLs in a "true proposition"
makes them invalid. I would advice against that.
This topic will pop up every now and then. If there's one thing that
I would like, is that people stop using NULLs way too often, I think
we can agree on that.
I for one had many big and small problems with NULLs stored in
the database. That made me appreciate more how easy it is to avoid
them and how to handle situations differently.
Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com
Hmm, I had hoped for a citation to a free online document that I could read.
I don't fancy the thought of spending $100+ US on a book just to argue the
fine points of database design....
Then again, I didn't ask for a link to a free source of this information so
I can't fault you for your link, it (presumably) answers the question.
>> Frankly, I'm skeptical about your interpretation of this idea. While you
>> clearly don't want false information in a database, it isn't false to say
>> that student such-and-such's exam mark or such-and-such an employee's
>> termination date is unknown or not applicable. It _would_ be false to
> store
>> a grade of 0 for a student who didn't take the exam because it implies
> that
>> he got every question wrong when in fact he didn't write the test at all;
>
> Agreed.
>
>> that would be an example of 0 having two contradicatory meanings. Given a
>> phobia about nulls, it is a reasonable design to put students who don't
> take
>> the test in a separate table but it's not the ONLY way to handle the
>> situation: a null to indicate a student who didn't take the test is
>> perfectly reasonable too.
>
> Disagreed. Let me explain further:
> In this case, a "null" means (according to you and in your design)
> that the student did not take the test.
>
Yes.
> It could also mean that we lots it's grade for now, and that we
> want to fill it in later, as in: "unknown", which is what NULL also
> means (your own words).
>
> In another case, null would be the employee isn't terminated yet,
> but could be in the future.
>
> That's the problems with nulls -> there's no actual defined
> meaning.
I agree that the meaning is not a single, very narrow thing like 'zero' or
'unknown' in every single case. You have to put a null into some kind of
context. But that context is usually fairly obvious, at least in my opinion.
>You can never say:
>
> "This table holds test results, a row will mean:
> Student with StudentID 123 made test #12 and scored a 4"
>
> Cause you can have rows that can mean this:
> Student with StudentID 123 made test #12 and scored a
> yet unknown score
>
> or
>
> Student with StudentID 123 made test #12 and did not make
> the test
>
> or
> Student with StudentID 123 and test #12 doesn't apply to him
>
If the exact reason that a null is in a particular row is particularly
important, you can always define an extra column that describes the reason
for the null. "A" could mean the score hadn't been entered yet because the
exam hadn't taken place yet, "B" could mean the test has already taken place
but the student didn't write the exam, "C" could mean that student has no
score because he was excused from the test for some reason. I would agree
that this is cumbersome but I think you'd have a very similar problem with
your design: you would have put some people in a separate table if they
didn't write the test but you'd have to identify the reason that the test
wasn't written too, if you cared about that, or have separate tables for
students who failed to attend the test and students who were excused from
the test. So I don't see that you're a lot farther ahead with your design.
In fact, your design could be _more_ complicated. In your design a program
couldn't simply look up the student by his ID in a single table and then
react to a null; the program would have to look in the main Grades table
first, then, if it didn't find the student there, would have to look in the
table that records people who didn't write the test because they were sick.
If the row wasn't found there, it would have to look in the table that
records people who were excused from the exam in advance. Frankly, I think
most programmers would rather look in one table than have to check two or
three or more for a given student.
>
> In short: NULL is ambigious.
>
It _can_ be. But the context frequently makes its meaning obvious.
Agreed, a null might mean "this guy doesn't have a middle name" or "I didn't
ask him for his middle name so I don't know if he has one". That may or may
not be important; it could be very important if we were trying to
distinguish between George Herbert Walker Bush (the 41st president of the
US) and George Walker Bush (the current president of the US). But null isn't
really a problem in this situation so much as a symptom of the real problem:
if the middle name is important, why aren't I determining the middle name(s)
of the person when I add him to the database? The null is not dishonest in
this case; it correctly states that the first and last names of both men are
"George" and "Bush". But our input process is weak because we didn't realize
that such cases could occur so we failed to ensure that our programs were
capturing middle names.
>> > The number of problems we had with them, I cannot count on my
>> > two hands, nor can I count them on the hands of all other developers
>> > (5) for that project.
>> >
>> How about enumerating a few of them here? I've dealt with relational
>> databases in several capacities for over 20 years and I've never found
> nulls
>> to be a big problem.
>
> Not big, but a problem... :-)
>
> Do you often go around looking for (little) problems while you
> can easily avoid them?
>
In most of the cases you've described, you don't really avoid problems so
much as trade them for different problems.
If you store students who don't take the test in one table and students who
are excused in advance in another table and students who took the test in
yet another table, you avoid having to deal with nulls but you introduce new
problems: you have to manage three tables now - more backups, probably more
space consumed, etc. - and your programs have to have more sophisticated
logic to find a given student. You may find that the problems you've avoided
were smaller problems than the ones you traded them for. Or not. As usual in
database design, "it depends".
>> > Just last week, I spoke to a guy who used to be my collegue there
>> > and we discussed the design a bit (he still does database design and
>> > development for a large company, on Sybase with over 4000
>> > deployed systems) and we both agreed that using NULLs in most
>> > cases as part of the initial design was causing us more problems
>> > then helping us.
>> >
>> Again, how about listing some of these problems?
>
> Well, for one, we had a table that stored all kinds of different
> actions that should take place. Each action could be part of
> a list of actions.
>
> Someone decided that actions with a NULL for the "belongs
> to list of actions #" would be an action "by itself".
>
> Business wise, one could create an order for a product and that
> product had to be created by a list of certain actions. The order
> creator would then use any pre-defined list of actions, or create
> such a list himself by selecting the different actions.
>
> Asking for stand alone actions would then (obviously simplified):
> select * from bewerking where list is null
>
> This is a slow query. Period. One that could have easily been
> avoided by not storing NULLs, but using a separate table.
> Why a table? Because "standalone actions" do not equal
> "actions that belong to a list".
>
Sorry, your example wasn't sufficiently detailed for me to really follow it.
I really don't understand the scenario you are describing.
Well, I shouldn't have lead off with a weak example but it was the first one
I thought of....
Nevertheless, even an improbable scenario should be a concern if it is
possible that it can happen, even if the likelihood is fairly low. If you
can rule out the possibility altogether, then you can dismiss it as a
concern.
I think it is POSSIBLE that an exam table might be the only record of a
student's existence, even if it is unlikely. I've certainly seen some pretty
weak designs proposed on this and other newsgroups/lists over the years.
Newbies in particular are prone to be very weak in design. Just yesterday, I
saw a post where someone wanted to store the purchase of up to 5 stocks in a
single table. That, of course, is terrible design since it causes a
repeating group, which violates First Normal Form. But the poster had no
idea of the bad consequences and was quite prepared to build such a table.
Unfortunately, I'm having trouble thinking of better scenarios at the
moment. Still, the scenarios I've given are certainly possible,
particularly in a small shop where the database guy is very inexperienced.
Fine, you PREFER your design: I can live with that. I just took exception to
the idea that a design using nulls was inherently bad, or wrong, or
unworkable.
> Do note I don't claim to have the key to every problem, but I
> would like to know people that design WITH NULLs doesn't
> have to be the standard although SQL provides them.
>
Agreed; you don't have to use nulls. In fact, some shops put a "NOT NULL"
beside every column definition in every CREATE TABLE statement. But many
shops _don't_ do that and have perfectly workable database designs.
Your syntax isn't quite correct; coalesce doesn't use a concatenation
symbol, it uses a list of arguments like this:
coalesce(Salary, commission, bonus)
Actually, coalesce has been around for a long time. DB2 (on the mainframe)
had it starting in Version 1.3 if I recall correctly; it was called values()
then but it worked exactly the same as coalesce. The SQL standard decided to
call it coalesce so DB2 developers just created a new entry point for the
function and the now both function names can be used.
But MySQL is quite a bit later to get some of these functions because it is
a much younger product.
>> > 3 - its sometimes much easier to avoid storing NULLs AND to be
>> > able to refactor your database because of it
>>
>> I agree that nulls can frequently be avoided and that avoiding them will
>> simplify _some_ situations.
>>
>> > 4 - the meaning of NULL can change, so why store it in the first place
>> >
>> Simple: because it happens to be true when it is stored. In the case of
> the
>> hypothetical employee, I store a null termination date when I hire him
>> because I don't know when he is going to leave.
>
> Why store something that you -don't- know.
>
To acknowledge that the employee will leave eventually and is still a
perfectly valid employee even though you don't know the termination date.
>> If he laters gives his
>> notice, then I know when he is leaving and can store that date for his
>> termination date instead of a null; then his row of the table is true
> again,
>> based on the new facts.
>>
>> >> If you are simply saying that you don't like them and prefer to use
>> >> different designs to avoid them, then I don't have any problem with
> that.
>> >
>> > I have seen that when I avoid storing NULLs, my applications
>> > became more clear and easier to understand.
>> >
>> Beauty is in the eye of the beholder as we all know. I have no problem
> with
>> having nulls in my tables and consider that a better design than a
> separate
>> table for special cases most of the time.
>>
>> Again, if you are saying that you don't like nulls and prefer to avoid
> them
>> in your designs, I have no problem with that; that's just your personal
>> preference. It's the same as if we sat down to eat a meal and you asked
> for
>> chocolate ice cream for dessert and I had vanilla; neither choice is
>> "wrong", they are just personal preference.
>
> Well, to that we agree and I would order sugared whipped cream with it ;-)
>
I guess you're not diabetic :-) Yet ;-)
>> But if you are stating categorically that nulls are always "bad" or
> "wrong",
>> I disagree strongly. That's like saying only chocolate ice cream is
>> acceptable and that all other flavours are evil. That's just wrong.
>
> We are not going to agree, that much is clear.
>
> I think we raised some very interesting points though (which is exactly
> why I started this in the first place).
>
> I would avoid NULLs if I could. It makes - very often - more sense
> to avoid them than to include them.
>
> You say that it's not "bad" to have NULLs every now and then, a
> statement to which I can agree up to a certain height.
>
Okay, that's really all I was trying to say. Your initial tone seemed to be
that nulls were always bad and should never be used; that's what I was
disagreeing with.
> I would also like to state, that I've seen design where NULLs is
> grossly overused for all sorts of situations. Your simple examples
> can be among those. IF you're using NULLs, you'd better be aware
> of what you're doing. Just don't use NULL for the sole purpose of
> avoiding a table or relation.
>
Agreed.
> I'd still say that in general, the "true proposition" remark holds and
> is very very valid. With that as the basis, design your database.
> I think I've also described why using NULLs in a "true proposition"
> makes them invalid. I would advice against that.
>
As long as you accept that nulls are not always evil and are actually
sometimes valid and useful, I don't think we have a problem. Clearly, if you
and I were both assigned to design a database for the same specifications,
yours would have more tables and fewer nulls than mine. I'd still expect
both of our designs to work satisfactorily.
>
> This topic will pop up every now and then. If there's one thing that
> I would like, is that people stop using NULLs way too often, I think
> we can agree on that.
>
> I for one had many big and small problems with NULLs stored in
> the database. That made me appreciate more how easy it is to avoid
> them and how to handle situations differently.
>
I have no problem with that point of view. This is just a difference of
opinion or emphasis that two reasonable people can have.
--
Rhino
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.2.3/281 - Release Date: 14/03/2006
You can have one table and allow NULLs for some of the fields (id,
LastName, FirstName, MiddleInitial, Title, NameSuffix, AddressLine1,
AddressLine2, City, State, Zip, Phone1, Phone2, Fax, Pager), or you can
have a separate table for each of the nullable fields (I count 8 in
addition to the base demographics table).
I am not a master DBA but it seems kind of brain damaged to insist on
using 9 tables left joined together every time I need a line of
demographics.
>>Frankly, I'm skeptical about your interpretation of this idea. While you
>>clearly don't want false information in a database, it isn't false to say
>>that student such-and-such's exam mark or such-and-such an employee's
>>termination date is unknown or not applicable. It _would_ be false to
>
> store
>
>>a grade of 0 for a student who didn't take the exam because it implies
>
> that
>
>>he got every question wrong when in fact he didn't write the test at all;
>
>
> Agreed.
>
>
>>that would be an example of 0 having two contradicatory meanings. Given a
>>phobia about nulls, it is a reasonable design to put students who don't
>
> take
>
>>the test in a separate table but it's not the ONLY way to handle the
>>situation: a null to indicate a student who didn't take the test is
>>perfectly reasonable too.
>
>
> Disagreed. Let me explain further:
> In this case, a "null" means (according to you and in your design)
> that the student did not take the test.
>
> It could also mean that we lots it's grade for now, and that we
> want to fill it in later, as in: "unknown", which is what NULL also
> means (your own words).
>
> In another case, null would be the employee isn't terminated yet,
> but could be in the future.
>
> That's the problems with nulls -> there's no actual defined
> meaning. You can never say:
>
> "This table holds test results, a row will mean:
> Student with StudentID 123 made test #12 and scored a 4"
>
> Cause you can have rows that can mean this:
> Student with StudentID 123 made test #12 and scored a
> yet unknown score
>
> or
>
> Student with StudentID 123 made test #12 and did not make
> the test
>
> or
> Student with StudentID 123 and test #12 doesn't apply to him
>
>
> In short: NULL is ambigious.
>
>
>>>The number of problems we had with them, I cannot count on my
>>>two hands, nor can I count them on the hands of all other developers
>>>(5) for that project.
>>>
>>
>>How about enumerating a few of them here? I've dealt with relational
>>databases in several capacities for over 20 years and I've never found
>
> nulls
>
>>to be a big problem.
>
>
> Not big, but a problem... :-)
>
> Do you often go around looking for (little) problems while you
> can easily avoid them?
>
>
>>>Just last week, I spoke to a guy who used to be my collegue there
>>>and we discussed the design a bit (he still does database design and
>>>development for a large company, on Sybase with over 4000
>>>deployed systems) and we both agreed that using NULLs in most
>>>cases as part of the initial design was causing us more problems
>>>then helping us.
>>>
>>
>>Again, how about listing some of these problems?
>
>
> Well, for one, we had a table that stored all kinds of different
> actions that should take place. Each action could be part of
> a list of actions.
>
> Someone decided that actions with a NULL for the "belongs
> to list of actions #" would be an action "by itself".
>
> Business wise, one could create an order for a product and that
> product had to be created by a list of certain actions. The order
> creator would then use any pre-defined list of actions, or create
> such a list himself by selecting the different actions.
>
> Asking for stand alone actions would then (obviously simplified):
> select * from bewerking where list is null
>
> This is a slow query. Period. One that could have easily been
> avoided by not storing NULLs, but using a separate table.
> Why a table? Because "standalone actions" do not equal
> "actions that belong to a list".
>
>
>>>>Okay, that might be acceptable, if it doesn't cause you to lose track
>
> of
>
>>>the
>>>
>>>>student altogether. But if this table was the only one that even
>
> recorded
>
>>>>the _existence_ of the student, you'd have a problem;
>>>
>>>Indeed, then I would have a problem, cause you cannot derive from
>>>the "test results" table that a student exists. This is a rather silly
>>>statement.
>>>If the "test results" table should ALSO store the existence of a
>
> student,
>
>>>you're design is wrong :-)
>>>
>>>
>>>>if someone tried to
>>>>verify that the student had attended this school, you wouldn't know
>
> that
>
>>>>they had.
>>>
>>>In your own words: nonsense.
>>>
>>>Each table should store what is was designed to store. If I would
>>>have to know if a student was enlisted in a certain course or would
>>>be attending this school, I would not be using the "test results" table.
>>>
>>
>>Look, I agree that this scenario is not particularly likely to happen but
>
> it
>
>>COULD happen.
>
>
> It's not me who made up the example :-)
>
>
>>Suppose you run a small company that runs certification exams:
>>your only contact with the students is that they show up to take an exam
>
> and
>
>>you record the mark and send it to whoever awards the certification. In
>
> that
>
>>case, you likely wouldn't have an elaborate set of tables containing a
>>variety of student information the way you would if you were a university.
>>In that case, it's quite reasonable to believe that the only table in the
>>system is the one that records the test marks. That table may include the
>>student name, address, test name, score, etc. etc. That table may only be
>>populated as the student comes in the door: the 'greeter' welcomes the
>>student to the test center, asks his name and other details, points him to
>>the exam room, and records the test result when the student hands in the
>>exam. In that scenario, if student Joe Blow does not show up for the test
>>because his car broke down, he may never be entered in the system at all
>
> and
>
>>you have therefore lost sight of the fact that he ever existed. Again,
>
> this
>
>>is not a very likely scenario but it is not an impossible one and that's
>
> the
>
>>key point: if it COULD happen, you will have a problem.
>
>
> See above. There's no point in taking exams from random people.
>
> You are trying to come up with all sorts of unlikely scenarios to
> justify a solution that's, IMO, faulty.
>
>
> Do note I don't claim to have the key to every problem, but I
> would like to know people that design WITH NULLs doesn't
> have to be the standard although SQL provides them.
>
>
>>>3 - its sometimes much easier to avoid storing NULLs AND to be
>>> able to refactor your database because of it
>>
>>I agree that nulls can frequently be avoided and that avoiding them will
>>simplify _some_ situations.
>>
>>
>>>4 - the meaning of NULL can change, so why store it in the first place
>>>
>>
>>Simple: because it happens to be true when it is stored. In the case of
>
> the
>
>>hypothetical employee, I store a null termination date when I hire him
>>because I don't know when he is going to leave.
>
>
> Why store something that you -don't- know.
>
>
>>If he laters gives his
>>notice, then I know when he is leaving and can store that date for his
>>termination date instead of a null; then his row of the table is true
>
> again,
>
>>based on the new facts.
>>
>>
>>>>If you are simply saying that you don't like them and prefer to use
>>>>different designs to avoid them, then I don't have any problem with
>
> that.
>
>>>I have seen that when I avoid storing NULLs, my applications
>>>became more clear and easier to understand.
>>>
>>
>>Beauty is in the eye of the beholder as we all know. I have no problem
>
> with
>
>>having nulls in my tables and consider that a better design than a
>
> separate
>
>>table for special cases most of the time.
>>
>>Again, if you are saying that you don't like nulls and prefer to avoid
>
> them
>
>>in your designs, I have no problem with that; that's just your personal
>>preference. It's the same as if we sat down to eat a meal and you asked
>
> for
>
>>chocolate ice cream for dessert and I had vanilla; neither choice is
>>"wrong", they are just personal preference.
>
>
> Well, to that we agree and I would order sugared whipped cream with it ;-)
>
>
>>But if you are stating categorically that nulls are always "bad" or
>
> "wrong",
>
>>I disagree strongly. That's like saying only chocolate ice cream is
>>acceptable and that all other flavours are evil. That's just wrong.
>
>
> We are not going to agree, that much is clear.
>
> I think we raised some very interesting points though (which is exactly
> why I started this in the first place).
>
> I would avoid NULLs if I could. It makes - very often - more sense
> to avoid them than to include them.
>
> You say that it's not "bad" to have NULLs every now and then, a
> statement to which I can agree up to a certain height.
>
> I would also like to state, that I've seen design where NULLs is
> grossly overused for all sorts of situations. Your simple examples
> can be among those. IF you're using NULLs, you'd better be aware
> of what you're doing. Just don't use NULL for the sole purpose of
> avoiding a table or relation.
>
> I'd still say that in general, the "true proposition" remark holds and
> is very very valid. With that as the basis, design your database.
> I think I've also described why using NULLs in a "true proposition"
> makes them invalid. I would advice against that.
>
>
> This topic will pop up every now and then. If there's one thing that
> I would like, is that people stop using NULLs way too often, I think
> we can agree on that.
>
> I for one had many big and small problems with NULLs stored in
> the database. That made me appreciate more how easy it is to avoid
> them and how to handle situations differently.
>
> Martijn Tonies
> Database Workbench - development tool for MySQL, and more!
> Upscene Productions
> http://www.upscene.com
> My thoughts:
> http://blog.upscene.com/martijn/
> Database development questions? Check the forum!
> http://www.databasedevelopmentforum.com
>
>
--
At 1:49 pm -0500 14/3/06, Rhino wrote:
>A null means "unknown or not applicable" and is a perfectly valid value to use in many, many situations.
It's not often you can say that two people are quite literally arguing about nothing!
James Harvard
(... being flippant because the actual arguments started going over my head about half-a-dozen posts ago...)
:-)
> James Harvard
>
> (... being flippant because the actual arguments started going over my
head about half-a-dozen posts ago...)
>
We're arguing about whether or not your database design
(as stored on disk) should contain NULLs.
IMO: no, or at least as few as possible.
Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com
>We're arguing about whether or not your database design
>(as stored on disk) should contain NULLs.
>
>IMO: no, or at least as few as possible.
>
>
I believe the answer really is - *it depends*.
You are both right, really. Martijn, yes, according to academia and
proper database design, you should not really be using NULLs, (Rhino,
see the writings of Chris Date and Fabian Pascal somewhere like
www.dbdebunk.com, www.thethirdmanifesto.com etc.). Here's a good one:
http://www.dbdebunk.citymax.com/page/page/1396241.htm
Of course, you're still going to pay $10-$15 for this..
However, what is good for academia is *not* always good for the real
world. Why store "Unknown" in a column for a table that could store
millions, or billions, of rows, when you could simply allow a NULL (and
save a *bunch* of disk space across your schemas) in the real world. If
you can allow for the use of the NULL in the application, and *not* try
to interpret it 5 ways from Sunday, then, in my honest opinion, a NULL
would be acceptable.
Short answer, as long as you do it without trying to get your
application to handle them in many different ways, I don't see a great
problem (when weighing them against extra costs, in both space and
speed). If you can get away with not using NULL as well, then great.
This argurment will, however, carry on between "academia" and "the real
world" for at least the next 4000 years. :)
Best regards
Mark
--
Mark Leith, Support Engineer
MySQL AB, Worcester, England, www.mysql.com
Are you MySQL certified? www.mysql.com/certification
Got that one ;)
The book I quoted is from Date, btw, and I enjoy reading it ...
> Of course, you're still going to pay $10-$15 for this..
>
> However, what is good for academia is *not* always good for the real
> world. Why store "Unknown" in a column for a table that could store
> millions, or billions, of rows, when you could simply allow a NULL (and
> save a *bunch* of disk space across your schemas) in the real world.
Well, the question still is if you should store "unknown" at all ;)
Not according to Date: you should store what is known. See the remarks
about the "true propositions", from which relational databases are derived
(but you probably know that).
>If
> you can allow for the use of the NULL in the application, and *not* try
> to interpret it 5 ways from Sunday, then, in my honest opinion, a NULL
> would be acceptable.
>
> Short answer, as long as you do it without trying to get your
> application to handle them in many different ways, I don't see a great
> problem (when weighing them against extra costs, in both space and
> speed). If you can get away with not using NULL as well, then great.
I guess I can agree with that one.
> This argurment will, however, carry on between "academia" and "the real
> world" for at least the next 4000 years. :)
Until it gets replaced by a better model, but I don't expect that anytime
soon ;-)
Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com
--
I think the words of Donald Rumsfeld are appropriate here:
"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."
At 12:02 pm +0100 16/3/06, Martijn Tonies wrote:
>Well, the question still is if you should store "unknown" at all ;)
>
>Not according to Date: you should store what is known.
<snip>
> But (in at least some situations) is not appropriate to record that you
know that you don't have a value?
>
Sure it would. It it's needed.
But the answer doesn't have to be a NULL. Remember
the example about recording the fact that a student did
not make a test for whatever reason?
It was then said you could store "null" for the test score.
I then replied that that was a bit backward IMO, cause
you're using the "test scores" table to store that a student
did not take a test, which is wrong, given that each row
in a table should mean exactly the same thing (theory
knocking at the door once again) and that makes a design
more clear.
> I think the words of Donald Rumsfeld are appropriate here:
> "There are known knowns. These are things we know that we know. There are
known unknowns. That is to say, there are things that we know we don't know.
But there are also unknown unknowns. There are things we don't know we don't
know."
>
I think we'll have a hard time storing Donalds "knows" and
"unknowns" in a database... :-)
> At 12:02 pm +0100 16/3/06, Martijn Tonies wrote:
> >Well, the question still is if you should store "unknown" at all ;)
> >
> >Not according to Date: you should store what is known.
> <snip>
Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com
> Well, the question still is if you should store "unknown" at all ;)
>
> Not according to Date: you should store what is known. See the remarks
> about the "true propositions", from which relational databases are
derived
> (but you probably know that).
As someone totally unread in the theory of databases, that seems unduly
puritanical. I assume that what Date would propose is that you have
another table (related by master key) in which, if you do not know
something, you do not enter it. But this means that if you have 10
different pieces of potentially but not necessarily available information
about a single master record (e.g. a person), you have to do a 10-way join
in order to retrieve all the information about them. Replacing a
theoretically ugly null flag with a 10 way join strikes me, as an engineer
rather than a theoretician, the wrong side of the elegance/practicality
trade-off.
Alec
Using NULLs as well as de-normalization brings the risk of
integrity problems to your storage, storing what is right is only
a good thing.
And when it comes to having to writing JOINs for all your queries,
lo and behold, I bring you the wonder of the VIEW.
;-)
Martijn Tonies
Upscene Productions
http://www.upscene.com
The VIEW eases the syntax, but does it do anything for performance? Surely
it must be much slower to read 11 different tables (Master record
containing all NOT NULL fields, and 10 slave records which may or may not
contain relevant fields)? Ignoring caching, you are going to have at least
one disk access for every NULL field (index lookup which fails) and two
for every non NULL field (index lookup, data lookup) for every null field.
This means that you have multiplied your number of disk accesses (ignoring
caching, again) by 6-11 times (assuming the master record takes two disk
accesses). That again seems a very high price to pay for theoretical
elegance.
Alec
Keeping your design clean is hardly sacrificing anything.
The most important feature of your database and the database
design is the integrity of your data. Period.
De-normalization, NULLs et all make this task much harder,
as well as understanding your design for the guy that takes
over your job ...
Performance comes next, first thing to do there is getting
your indices straight.
Next thing is optimizing the slow stuff - if you can go around
by keeping, for example, (these much smaller slave) tables
in memory, and this works out fine, then do so.
If all your queries are optimized, the next thing to do is
to throw hardware at it. This is very cheap comparing to
your hourly rate. ;-)
Obviously, if your queries and design are valid and the
database system is still slow, you can also complain with
the creators of the database system -> performance, after
all, is a physical attribute, not a database design issue perse.
During the years, many database systems have seen many
improvements to bring better performance to its users.
Only recently, for example, the Firebird DBMS implemented
a new on-disk structure for it's indices greatly enhancing
performance for indices with a large number of duplicates. In
the past, people said: you have to modify your design such
and so, with all risks included. Now, you can keep your design
as clear as possible and enjoy greater performance, just because
someone thought of something clever. If no-one ever complained,
this particular piece of code wouldn't have changed.
Martijn Tonies
Database Workbench - tool for InterBase, Firebird, MySQL, Oracle & MS SQL
Server
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com
--
At 11:27 am +0000 16/3/06, Alec....@quantel.com wrote:
>As someone totally unread in the theory of databases, that seems unduly puritanical. I assume that what Date would propose is that you have another table (related by master key) in which, if you do not know something, you do not enter it. But this means that if you have 10 different pieces of potentially but not necessarily available information about a single master record (e.g. a person), you have to do a 10-way join in order to retrieve all the information about them. Replacing a theoretically ugly null flag with a 10 way join strikes me, as an engineer rather than a theoretician, the wrong side of the elegance/practicality trade-off.
> Also, surely if you left join to a table then if there's no matching row
you get a null value in your result set. So I'm not sure what the difference
is between getting that null because you store it in a column in the primary
table, and deriving it by left joining to a related table with no matching
rows.
Handling nulls in your resultsets makes a heck of a
lot more sense than also storing these NULLs :-)
Martijn Tonies
Upscene Productions
http://www.upscene.com
So you can't hide the underlying structure currently in MySQL. Like all
other things we have to commingle best practices with pragmatism. In =
other
words do what makes sense while at the same time acquiring a good =
knowledge
of fundamentals. {Not just slap name address phone1 .... phone n in some
table with an auto-increment ID field and say your done with the =
design.}
By the way, Date and Codd {both way above me in math and theory} had =
this
discussion ~20 years ago over a 6 month period in pages and pages of
Database magazine. In the end I believe they agreed to disagree.
-----Original Message-----
From: Martijn Tonies [mailto:m.to...@upscene.com]=20
Sent: Thursday, March 16, 2006 5:33 AM
To: my...@lists.mysql.com
Subject: Re: Accountability with MySQL
From: <Alec....@quantel.com>
> "Martijn Tonies" <m.to...@upscene.com> wrote on 16/03/2006 11:02:32:
>
> > Well, the question still is if you should store "unknown" at all ;)
> >
> > Not according to Date: you should store what is known. See the =
remarks
> > about the "true propositions", from which relational databases are
> derived
> > (but you probably know that).
>
> As someone totally unread in the theory of databases, that seems =
unduly
> puritanical. I assume that what Date would propose is that you have
> another table (related by master key) in which, if you do not know
> something, you do not enter it. But this means that if you have 10
> different pieces of potentially but not necessarily available =
information
> about a single master record (e.g. a person), you have to do a 10-way =
join
> in order to retrieve all the information about them. Replacing a
> theoretically ugly null flag with a 10 way join strikes me, as an =
engineer
> rather than a theoretician, the wrong side of the =
elegance/practicality
> trade-off.
Using NULLs as well as de-normalization brings the risk of
integrity problems to your storage, storing what is right is only
a good thing.
And when it comes to having to writing JOINs for all your queries,
lo and behold, I bring you the wonder of the VIEW.
;-)
Martijn Tonies
Upscene Productions
http://www.upscene.com
--=20
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: =
http://lists.mysql.com/mysql?unsub=3Dgo...@axisos.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmy...@freebsd.csie.nctu.edu.tw