In this one we're cataloguing real persons. Since we have a global
data set, we cannot rely on there being anything like a national
identifier for each person, though for some nations we do have them.
This is a partial key, then. For some other people we might have to
resort to just recording a full name and a birthdate, for another
partial key -- legislation might state that a national identifier is
something that can be freely shared, but details of a person's age are
not. Yet another country might only allow us to collect cell phone
numbers and names, but neither birthdates nor id's. Without further
information, we now have a collection of partial keys, none of which
constitutes a key for all of the data. We also might have cases where
one or more of the partial, and overlapping, composite keys exist at
the same time.
Mapping something like that relationally is a bit challenging. The
first idea (and the actual pattern I've seen) is to allow nulls in
keys (no primary key in sql, but a unique key) and to simultaneously
enforce each partial key where applicable. The solution is workable,
but a bit dirty: we don't really like nulls and inclusion dependencies/
referential integrity constraints become rather difficult to define
and enforce.
The second option would be to separate all of the cases and create new
relations, which then need not contain nulls at all. We'd have one
relation for the cases where there is just an id, one for name and
birthdate, one for name and phone number, one for id, name and
birthdate but no phone number, and so on until all of the combinations
are exhausted. But this rapidly leads to combinatorial explosion.
Also, applying such an approach would also split every single relation
which refers to a person into similarly many subrelations (because of
the combinatory nature of the possible presence of partial key
fields), with further combination if even a single extra entity
referred to has a similar problem. While theoretically clearly the
purest way to approach the problem in strict relational terms, in
practice the solution would simply become unmanageable.
Third, we might treat this as an example of the usefulness of
surrogate keys. The complicated assembly of overlapping partial keys
would then be treated as an intensional restriction, while the
surrogate key would support referential integrity, both on abstract
and concrete levels. (Of course such a key should really be opaque to
the user, as Codd suggests.) But once again, even the use of
surrogates is contentious.
So, how about a typing solution? Suppose we enhance our type system to
fully handle disjunctive and dependent types? So that we can have a
field containing a tagged union of (phone number, birthdate and nid).
That'd take care of a number of subcases. And if a field could
actually contain a fully structured, composite, disjunctive, dependent
type, like the union type of all of the combinations possible in the
example, then one would only need to declare the mess to be a valid
domain, and let the DMBS handle the specifics. The actual
implementation would likely take the form of flattened relational
tables with user-invisible nulls/marks/field-combination-coding, with
added processing to enforce all of the possible partial keys
simultaneously, and foreign keys containing any of the subsets for
matching. The user visible model would be constructed along the lines
of Codd's composite domains, so that it would be possible to
simultaneously get at both the individual fields of the composite, and
at the composite as a whole, with the DBMS enforcing propagation of
present values over all relations utilizing the composite domain.
Then a second nasty example which would actually seem to mandate
surrogate keys, exclusively. The basic requirement that key
constraints try to capture is unambiguous identifiability of all of
the data in the system, so that a) updates can be told apart from
insertions so as to be sure that no data duplication (or the attendant
update anomalies/inefficiencies) arises within the database, and b) to
guarantee that the data in the database can be reliably correlated
with real world entities (i.e. that there is a clear, unambiguous
means tell whether or not the logical theory encoded in the database
really has the real world as one of its models).
This way of looking at the motivation behind keys suggests that, in
fact, natural keys might not always suffice to model all of the
theories which we want to encode. The example comes from a schema
already using surrogate keys, without any easy way to get rid of them.
Here the theory is rather sparse: we want to encode people and their
phones. The problem is that we simply cannot capture enough
information on the people to form a proper, natural, primary key. We
can assume for example that the only data we have on them is the name,
so that there may be hundreds of John Smiths in the database, with
little extra data. By itself, this sort of database would be
impossible to maintain: there is no way to tell which possible updates
should be applied to which John Smith.
Then suppose each of the Johns will always have a personal cell phone
account. Now the phone number combined with the name will usually be
good enough to disambiguate between them. If each John was always
guaranteed to have precisely one number, we'd define a composite
primary key and be done with it. Except that people can have multiple
numbers/accounts, each with their own dependent information like hours
at which to call and the like. After that normalization tells us to
make the number(n)-to-person(1) relation into a freestanding one.
Suddenly unambiguous identification of a person is possible only if we
take into consideration two different relations at the same time ("the
John at (+358-555 or +358-666)), with no shared natural key.
I can see only two ways in which to sort this all out. First, we could
note that there is a kind of duality between this example and the
first one: there we dealt with keys that are structurally complicated
because of the combination of fields present, now we're dealing the
keys similarly complicated because of the combinatorial structure
within the field. As such, we could ostensibly go with type system
modification as in the first example. We could permit set datatypes on
the phone number, complicated key enforcement semantics, and domains
for foreign keys which can express e.g. the semantics that "I'm
referring to the John whose set of phone numbers contains at least one
of the following..." That I think would totally break the relational
model, and lead to a type system that is too powerful to ever be
tractable.
Or we could use surrogates.
My own thinking is that opaque surrogates are the right way to go
because they seem to solve both problems. But because of their
inherent problems, we should also include fully general, Turing
complete, yet preferably declarative integrity checking mechanisms
(i.e. akin to Prolog) which can enforce both single table integrity
constraints, and constraints relating to the database as a whole. That
way we'd aim at efficient, single table checks and index based cross-
table checks, but would also be able to automatically enforce more
complicated integrity rules, of the kind my two examples illustrate.
Tell me what you think, and sorry for going into essay length.
The natural key for persons all around the globe that comes to my mind
would be some DNA profile, what would however not be a practical
solution (difficult to get, too long, data protection issues).
But what about using an artificial key? I. e. create your own encoding
scheme, for example SMITHPA00001 for a person called "Paul Smith".
Being fixed lenght, possibly with a check digit at the end. Some
difficulties may arise with asian languages, where you would have to
transcribe them first.
Could be worth a try, prior you go with a surrogate or physical
locator instead.
brgds
Philipp Post
It ain't, because of identical twins. That sort of key is also too
fragile at the moment; it would be kind of like keying people off
their photos. I mean, usually the problem with formalized descriptors
like phone numbers, names, ages and the like is that the theory
embodied by the database could fit multiple real life models. I.e. we
have a total surjective relation from data to the representative real
life entities. In this case, however, the problem goes in the opposite
direction: there is simply too much redundancy and noise in the real
world fact for it to count as a key.
So in the end, we'd really like a total invertible function between
the entities and their representations in the database. Or in other
words a theory that uniquely fixes a single model.
> But what about using an artificial key?
That's what surrogate keys are all about. But as Codd suggested, such
keys should remain internal to the DBMS. Only strictly typed equality
comparisons should be allowed on them. Precisely because they are
artificial, and as such will easily lead to an invented mismatch
between the logical content of the database and the reality it is
supposed to represent.
> I. e. create your own encoding scheme, for example SMITHPA00001 for a person called
> "Paul Smith".
The problem is that you could have multiple Paul Smiths. Okay, then
you can enumerate them, and what you have is basically a specially
formatted surrogate key. But how precisely do you then know which
record should be updated when e.g. Paul's age changes? And if you
cannot discern that, why precisely do you have those two separate
Pauls as separate records?
The point is that a) if you cannot tell two real life entities apart
based on their representation within the database, then b) you cannot
tell any dependent data apart either, so that c) the representations
within the database should actually be merged, in order to avoid data
duplication, update anomalies and the like.
Spot on. The unmentioned context to do with identify is usually
societal and often political, nothing to do with db theory IMHO.
Databases don't record objects: they record facts about objects. In
the first scenario, the facts you're recording don't have the same
predicate, even though they all are about objects that have the
property 'being a person.' 'being a person with national id <nid>'
implies 'being a person.' 'being a person named <name> with birth
date <bdate>' implies 'being a person.' The obvious solution is to
introduce a relation with the predicate 'being a person <p>' and to
modify the other relations' predicates 'being a person <p> with
national id <nid>' and 'being a person <p> named <name> with birth
date <bdate>.' This way there is one relation for facts that state
just that there is a particular person. By isolating the implied
predicate--the one that asserts just the objects' existence, there can
be a bijective mapping from the tuples in that relation to the objects
in the universe of discourse that have the property 'being a person.'
Of course this requires the introduction of surrogates, but logically,
surrogates are nothing more than individual constants: arbitrary
symbols for particular objects in the universe. I see absolutely no
difference between assigning arbitrary symbols to objects that have
the property 'being a person' and assigning arbitrary symbols to
objects that have the property 'being a natural number.' The symbols
'1' and '2' are in essence 'surrogates' for the first two natural
numbers.
I think this is what Codd had in mind when he introduced the E-
relations of RM/T. E-relations are primarily for asserting entities'
existence.
> Databases don't record objects: they record facts [...].
Admirably close, but not quite cigar-worthy. Databases record
*assertions* of fact. The assertsions may be sincere and truthful, or
sincere but false, or deceitful and false. As I have said before, it
is useful to think of the content of a database as being like the
testimony in a court case.
My point being: you don't care, and we don't have to care (for the
purpose of designing the database or the software that operates on it),
about facts. All that matters is that we can make the inferences that
we should be entitled to make from the assertions. Whether the
inferences turn out to be factual or not is just not our business.
I can't bring myself to discuss the indentity issue. I am hopelessly
bored with the idea that every particle and every concept has a
barcode buried in it somewhere if we look hard enough. The problem of
identity is always solved by the business process, and if it isn't,
that's life's way of saying it doesn't need solving.
--
Roy
Excellent response!
My own point of view is slightly different from your last sentence.
I was around computers at a time when a lot of purely manual business
processes were computerized for the very first time. (I had originally
written "automated", but I deliberately changed the wording to
"computerized".) When you analyzed the processes in place, you found
innumerable references to the fact that human beings make decisions on the
basis of inadequate and contradictory information, and do so pretty
successfully.
So, if you asked a registrar "what do you do if there is more than one
student named 'Paul Smith'" the answer would generally be "you just have to
use your common sense." The construction of reliable systems at that time
involved repeated cases of substituting some formal mechanism for "common
sense". This includes, but is not limited to, formal mechanisms for
identity.
Nowadays, almost all systems that are to be programmed inherit a legacy
system that is already computerized. Hence, the dependency on "common
sense", if any, is not as evident as it was some thirty or forty years ago.
A lot of people who decide to start from scratch, and not to inherent the
conceptual flaws of the legacy system, discover that the legacy system has
been making irrational and unfortunate decisions for lo these many years,
and people have been living with it for a variety of reasons. A lot of
other builders never quite go be to starting over, even when they do a
complete code rewrite. They inherit the conceptual flaws of the legacy
system, and go from there. After all, the ultimate acceptance test is
likely to be running the new system and the legacy system side by side for a
few months to see if the new system is "good enough".
This doesn't deal with the issue that you are hopelessly bored with. But I
hope it sheds some light anyway.
Whatever. What is in the database is supposed to be true. Whether
'an assertion that is supposed to be true' is equivalent to 'a piece
of information presented as having objective reality' [fact (5)
accoring to Merriam Webster] is beside the point. My point is that it
is not objects that are recorded, but supposedly true statements about
objects.
> As I have said before, it
> is useful to think of the content of a database as being like the
> testimony in a court case.
I don't think that's very useful at all. The content of a database
isn't necessarily tagged with who said it and when they said it, but a
deliberating jury knows who said what when and can therefore weigh
each statement accordingly. Without a record of who said what when,
it is best to suppose that what is in the database is true.
> My point being: you don't care, and we don't have to care (for the
> purpose of designing the database or the software that operates on it),
> about facts. All that matters is that we can make the inferences that
> we should be entitled to make from the assertions. Whether the
> inferences turn out to be factual or not is just not our business.
>
> I can't bring myself to discuss the indentity issue. I am hopelessly
> bored with the idea that every particle and every concept has a
> barcode buried in it somewhere if we look hard enough.
It is not an issue of identity but rather identifiability. If a thing
can be distinguished from all other things, then it can be named. It
is lucky that first-order languages sport an infinite supply of
constant symbols.
> On Sep 21, 3:41�am, Roy Hann <specia...@processed.almost.meat> wrote:
>> Brian wrote:
>> > Databases don't record objects: they record facts [...]. �
>>
>> Admirably close, but not quite cigar-worthy. �Databases record
>> *assertions* of fact. �The assertsions may be sincere and truthful, or
>> sincere but false, or deceitful and false.
>
> Whatever. What is in the database is supposed to be true.
Says who?
It doesn't need to be true, and it can't be guaranteed to be
true, so it is wise to remember that what's in the database is just
claims and assertions. It is completely sufficient that the database is
consistent.
>> As I have said before, it
>> is useful to think of the content of a database as being like the
>> testimony in a court case.
>
> I don't think that's very useful at all. The content of a database
> isn't necessarily tagged with who said it and when they said it, but a
> deliberating jury knows who said what when and can therefore weigh
> each statement accordingly. Without a record of who said what when,
> it is best to suppose that what is in the database is true.
You want to assume the database is true and I want not to spend time
fretting whether it is or it isn't. How would you manipulate the data
differently from me because you know it is "true"?
If you did need to know it is true, then you would need to guarantee it
is. That would lead you into trying to design all kinds of clever ways
to give that guarantee. Which will fail. I've been involved in two
criminal justice projects that tried; both were train-wrecks.
--
Roy
Everyone who advocates the closed world assumption.
>
> It doesn't need to be true, and it can't be guaranteed to be
> true, so it is wise to remember that what's in the database is just
> claims and assertions. It is completely sufficient that the database is
> consistent.
>
I don't think that it is sufficient.
> >> As I have said before, it
> >> is useful to think of the content of a database as being like the
> >> testimony in a court case.
>
> > I don't think that's very useful at all. The content of a database
> > isn't necessarily tagged with who said it and when they said it, but a
> > deliberating jury knows who said what when and can therefore weigh
> > each statement accordingly. Without a record of who said what when,
> > it is best to suppose that what is in the database is true.
>
> You want to assume the database is true and I want not to spend time
> fretting whether it is or it isn't. How would you manipulate the data
> differently from me because you know it is "true"?
I don't know that what is in the database is true: I suppose that it
is true. Under the closed world assumption what is not in the
database is supposed to be false. As a consequence, many more
conclusions can be drawn from the same data. For example, if there
isn't a row in an employee table for Bob Smith, then under the closed
world assumption we can infer that Bob Smith isn't an employee, but
without the closed world assumption, that inference wouldn't be valid.
> On Sep 21, 10:27�am, Roy Hann <specia...@processed.almost.meat> wrote:
>> Brian wrote:
>> > On Sep 21, 3:41�am, Roy Hann <specia...@processed.almost.meat> wrote:
>> >> Brian wrote:
>> >> > Databases don't record objects: they record facts [...]. �
>>
>> >> Admirably close, but not quite cigar-worthy. �Databases record
>> >> *assertions* of fact. �The assertsions may be sincere and truthful, or
>> >> sincere but false, or deceitful and false.
>>
>> > Whatever. �What is in the database is supposed to be true. �
>>
>> Says who? �
>
> Everyone who advocates the closed world assumption.
The closed world assumption doesn't tell you anything about what is
actually in the database; it tells you how you are entitled to
manipulate what you find in the database. I hinted at that in
my first post when I wrote "All that matters is that we can make the
inferences that we should be entitled to make from the assertions."
>> It doesn't need to be true, and it can't be guaranteed to be
>> true, so it is wise to remember that what's in the database is just
>> claims and assertions. �It is completely sufficient that the database is
>> consistent.
> I don't think that it is sufficient.
Well that's too bad, 'cos that's all you can have.
--
Roy
You're wrong, of course, but don't take my word for it. According to
Date in /An Introduction to Database Systems, Eighth Edition/, page
161: 'the Closed World Assumption (also known as the Closed World
Interpretation) says that if an otherwise valid tuple--that is, one
that conforms to the relvar heading--does /not/ appear in the body of
the relvar, then we can assume the corresponding proposition is
false. In other words, the body of the relvar at any given time
contains /all/ and /only/ the tuples that correspond to true
propositions at that time.' So the closed world assumption tells us
that what is actually in the database is supposed to be true, while
what is not is supposed to be false.
> >> It doesn't need to be true, and it can't be guaranteed to be
> >> true, so it is wise to remember that what's in the database is just
> >> claims and assertions. It is completely sufficient that the database is
> >> consistent.
> > I don't think that it is sufficient.
>
> Well that's too bad, 'cos that's all you can have.
>
> --
> Roy- Hide quoted text -
>
> - Show quoted text -
> >> > Whatever. �What is in the database is supposed to be true. �
>>
>> >> Says who? �
>>
>> > Everyone who advocates the closed world assumption.
>>
>> The closed world assumption doesn't tell you anything about what is
>> actually in the database; it tells you how you are entitled to
>> manipulate what you find in the database. �I hinted at that in
>> my first post when I wrote "All that matters is that we can make the
>> inferences that we should be entitled to make from the assertions."
>
> You're wrong, of course, but don't take my word for it. According to
> Date in /An Introduction to Database Systems, Eighth Edition/, page
> 161: 'the Closed World Assumption (also known as the Closed World
> Interpretation) says that if an otherwise valid tuple--that is, one
> that conforms to the relvar heading--does /not/ appear in the body of
> the relvar, then we can assume the corresponding proposition is
> false. In other words, the body of the relvar at any given time
> contains /all/ and /only/ the tuples that correspond to true
> propositions at that time.' So the closed world assumption tells us
> that what is actually in the database is supposed to be true, while
> what is not is supposed to be false.
Far be it from me to contradict Date, but there is no way on earth that
he intended us to take that to mean "Garabage In, Garbage Out" doesn't
apply to databases. Date is just telling us the limits of how we are
entitled to manipulate the database and--less directly--what the
consequences of violating 5NF are.
--
Roy
> Brian wrote:
>
>
>>>>>Whatever. What is in the database is supposed to be true.
>>>
>>>>>Says who?
>>>
>>>>Everyone who advocates the closed world assumption.
>>>
>>>The closed world assumption doesn't tell you anything about what is
>>>actually in the database; it tells you how you are entitled to
>>>manipulate what you find in the database. I hinted at that in
>>>my first post when I wrote "All that matters is that we can make the
>>>inferences that we should be entitled to make from the assertions."
>>
>>You're wrong, of course, but don't take my word for it. According to
>>Date in /An Introduction to Database Systems, Eighth Edition/, page
>>161: 'the Closed World Assumption (also known as the Closed World
>>Interpretation) says that if an otherwise valid tuple--that is, one
>>that conforms to the relvar heading--does /not/ appear in the body of
>>the relvar, then we can assume the corresponding proposition is
>>false. In other words, the body of the relvar at any given time
>>contains /all/ and /only/ the tuples that correspond to true
>>propositions at that time.' So the closed world assumption tells us
>>that what is actually in the database is supposed to be true, while
>>what is not is supposed to be false.
>
> Far be it from me to contradict Date, but there is no way on earth that
> he intended us to take that to mean "Garabage In, Garbage Out" doesn't
> apply to databases.
Um, note the word "assume" in the quoted passage above. He doesn't say
it is false only that we can assume it is. Also note the phrase
"supposed to be". He doesn't say it is, which we have no way of
validating from the dbms, only that it is supposed to be.
> Date is just telling us the limits of how we are
> entitled to manipulate the database and--less directly--what the
> consequences of violating 5NF are.
Very precisely and very carefully, I might add.
I flatter myself that I understand that completely. However we (me and
Brian) seem to be talking about two different things. Brian has started
talking about how a DBMS has to work, and I am talking about the
assorted fantasies, lies, and honest-to-God truths end-users shovel into
databases. Databases <> DBMSs.
This gets important when we start designing systems for people to use,
because anyone who undertakes that task imagining that the software must
somehow ensure the database is a wonderland of infallible truths will
end up with unusable and very expensive junk. I've seen it happen
a couple of times, at great public expense.
--
Roy
Don't kid yourself: It happens at great private expense too.
I didn't say that "Garbage In, Garbage Out" doesn't apply to
databases. But if I don't suppose that what is in the database is
true, then there's no point in even having a database. Any conclusion
drawn from information that is not supposed to be true is unsound:
valid conclusions cannot be a consequence of false premises. So it is
only by supposing that the premises are true that any conclusions can
be drawn at all. The closed world assumption increases the number of
conclusions that can be drawn in the same way that the law of the
excluded middle increases the number of inferences that can be made,
but even under the open world interpretation, what is in the database
is supposed to be true.
What does the closed world assumption have to do with violating 5NF?
>You're wrong, of course, but don't take my word for it. According to
>Date in /An Introduction to Database Systems, Eighth Edition/, page
>161: 'the Closed World Assumption (also known as the Closed World
>Interpretation) says that if an otherwise valid tuple--that is, one
>that conforms to the relvar heading--does /not/ appear in the body of
>the relvar, then we can assume the corresponding proposition is
>false.
But what *is* that proposition? It might be
FIRTNAME LASTNAME is an employee at CORPORATION
but it might just as well be
at some time in the past, it has been asserted that
FIRTNAME LASTNAME is an employee at CORPORATION
which is a closed world formulation of what is approximately
the open world counterpart of the first.
So just stating the closed world assumption isn't enough -
you also have to rule out predicates that involve statements
about assertions.
--
Reinier
I don't think it is. Assuming that the assertion was true at that
time in the past, the proposition is temporally qualified, whereas the
first isn't.
>
> So just stating the closed world assumption isn't enough -
> you also have to rule out predicates that involve statements
> about assertions.
I really don't understand what you're trying to say: assertions are
not terms, at least not in first-order logic.
>
> --
> Reinier
This wish is quite understandable, but I do not see one for the
situation you described. There are a lot of difficulties as already
mentioned in previous posts:
- we deal primarily with assumptions
- the GIGO problem (bad or incomplete sources, human input errors)
- lifeforms are not encoded in a database compatible way when they
appear on earth
- social data is not easy to record as life bears lots of surprises in
it
Some horror stories can be found if you look at the genealogical
community. This is even more difficult as you deal with persons which
are not alive anymore and with old and contratictory source documents.
In the end each database assigns some magical auto-number or GUID to
each person and if you try to merge two databases you find yourself in
an unpleasant battle with the duplicates.
What however can be done is, that the one who maintains the internal
encoding scheme is doing it with utmost care in order to minimize
incorrect assignment of details. For the rest there is the UPDATE and
DELETE command.
brgds
Philipp Post
>>So in the end, we'd really like a total invertible function between the entities and their representations in the database. <
>
> This wish is quite understandable, but I do not see one for the
> situation you described.
To whom did you reply? Please do not remove the attributions.
> To whom did you reply? Please do not remove the attributions. <
I replied to Sampo's last message.
brgds
Philipp Post
And here you make a hideously common mistake, one that seems to
be more common to governments for some reason, but happens in the
private sector too.
The contents of a database are assertions of fact made by the
providers
of said information. If you are running (say) an order entry
database, it's
reasonably safe to assume that the assertions in the database are
true.
Even then, errors in data entry (for whatever reason) may lead to
database assertions that don't match the "real world" (i.e. are
false.)
If the data arrives via entities who either don't care or are actively
hostile to your goal in collecting data, your database is liable to be
full of assertions that don't match reality. That doesn't make it
not-a-database, nor does it necessarily invalidate the notion of
collecting the (untruthful) data. It DOES mean that you must treat
the database as WHAT IT IS -- assertions of fact, NOT fact.
It is usually permissible to proceed under the assumption that
the assertions in the database ought to be true. (Unless you
know better given the data source.) It is impermissible to
assume that they MUST be true.
I find that people who take your position tend to ignore the need for
voids, database cleaning, errors, etc. Maybe you are the exception
but I don't believe it for a moment. :-)
Karl
> What does the closed world assumption have to do with violating 5NF?
You can't have a violation of 5NF without it.
--
Roy
An assertion of fact IS a fact. Appealing again to Webster, a fact is
'a piece of information presented as having objective reality.' Isn't
assertion a kind of presentation?
If the sources of information may be suspect, then the source should
be recorded along with the information so that it can be taken into
account in queries. When the source is part of the record, then what
is in the database is an assertion of the form, 'so-and-so said such-
and-such,' which may safely be treated as being true even if such-and-
such isn't!
>
> It is usually permissible to proceed under the assumption that
> the assertions in the database ought to be true. (Unless you
> know better given the data source.) It is impermissible to
> assume that they MUST be true.
Why do you insist on introducing the modalities ought and must? Keep
it simple stupid!
> I find that people who take your position tend to ignore the need for
> voids, database cleaning, errors, etc. Maybe you are the exception
> but I don't believe it for a moment. :-)
What position is that? The position that what is given is supposed to
be true? I really can't believe I need to defend what is obvious to
anyone with even a rudimentary grasp of logic. Nothing I have written
here argues against validation or constraints or any other available
mechanism for keeping garbage out of the database or for removing
garbage from the database.
>
> Karl
Yes, you can. Suppose that you have the scheme
{Supplier, SupplierPhone, PartNumber}
which satisfies the functional dependencies
PartNumber --> Supplier --> SupplierPhone
This is a violation of 5NF because it is a violation of 3NF. Under
the open world interpretation, what is in the database is what is
known to be true, but that doesn't change the fact that if you delete
the row with the only part known to be supplied by a particular
supplier, you also erase the knowledge that that particular supplier
has a particular phone number.
I stupidly assumed you'd consider the comparison with a table that
*is* in 4NF.
--
Roy
I mean, normalization often calls for encoding the persons and their
ownership of cars in two separate relations. The first of which
seemingly cannot have a natural key. As such, a surrogate key would be
required, and no real alternative would be possible.
Personally I hate surrogates, though I use them for far less pressing
purposes (i.e. performance). Still I think that's mostly because we
don't really have the kind of true surrogates Codd suggested, where
the actual, underlying value is completely hidden. Nor do we have a
proper semantics -- Codd or not him -- for what a database with
surrogates is supposed to mean, or how to make it safe from update
anomalies and the like.
The RDF data model is one rare example where we do have anonymous
surrogates, in the form of "blank nodes". And we have a formal
semantics as well. But otherwise, sadly, we're then once again into
the quagmire that is EAV; or as I'd like to call it, gratuitous
reification.
>An assertion of fact IS a fact. Appealing again to Webster, a fact is
>'a piece of information presented as having objective reality.' Isn't
>assertion a kind of presentation?
Webster, and you, and millions of other people, are wrong.
Gee, you might be a computer scientist.
--
Reinier
Regarding the actual quote, I've long taken it to mean also that if a
relvar's complement were recorded and an otherwise valid tuple did not
appear in the complement, then it must appear in the body of the relvar.
Am I right?
Also, do views/derived relvars, eg., joins and unions, have complements
that could theoretically be recorded?
Since all I usually care about is theory, it's fine by me. But I might
feel differently if my name were John Smith and would probably switch to
a different repair shop.
Well, at least I have a lot of company.
Believe it or not, I once ran into a building wide phone directory where
looking up the name "John Smith" gave me the extension of the wrong person
named "John Smith". In order to find the one I was looking for, I had to
look up under "Jack Smith". The designers had used the name (actually the
last name comma first name) of the person as the primary key of an indexed
file. And this was at a computer company!
Normalization has very little to do with the semantics of the data. If
you'll look carefully at all the examples of converting from one normal form
to the next higher form, you'll see that the schema before conversion and
the one after conversion are equivalent in their capacity to express facts.
That's important.
Contrary to popular belief, adding a surrogate key to each table does not
change the level of normalization of a schema. Candidate keys remain
candidate keys after a surrogate has been added and declared the primary
key. I'm not sure, but I think you're asserting the opposite above.
How do you know that there are two sets of cars owned, with each set owned
by a different person, instead of one set consisting of two cars and with
one owner? Is that fact expressed in the database? How?
"When I use a word, it means precisely what I want it to mean. Nothing
more, nothing less."
--Humpty Dumpty--
>Still, to return to my original point, what do y'all think about the
>encoding of facts such as "yes, there are two separate persons called
>John Smith, and no, we don't have any more information about them as
>persons, yet given their sets of cars owned, the one with a Ferrari
>and the one with a Lamborghini are two different persons"?
If you admit that you can have incomplete data, you would have to
admit that the set of cars owned might be incomplete. John Smith 1
and John Smith 2 might be the same person. One of the sets cars owned
might be all of the cars that John Smith owned before some date (when
his data was collected for entry into the system), and the other set
might be his current car. These sets could be disjoint and subsets of
all of the cars that John Smith has ever owned.
[snip]
Sincerelyk,
Gene Wirchenko
And it's wrong! Normalization has everything to do with the semantics
of the data. If you look carefully at all of the examples of
converting from one normal form to the next higher form, you'll see
that the schema before conversion and the one after conversion are NOT
equivalent in their capacity to express facts. Instead, the schema
after conversion has at least the same capacity to express facts, but
not exactly the same capacity. For example, a 2NF schema that is not
in 3NF,
SupplierParts {Part Number, Supplier, Supplier Phone Number}
KEY(Part Number)
that satisfies the transitive functional depencency,
Part Number --> Supplier --> Supplier Phone Number
Does not have the same capacity to express facts as the third normal
form database scheme
Suppliers {Supplier, Supplier Phone Number}
KEY(Supplier),
Parts {Part Number, Supplier}
KEY(Part Number),
Parts[Supplier] IN Suppliers[Supplier]
In the 3NF scheme, it is possible for the fact that a supplier has a
particular phone number to be expressed even if the supplier doesn't
at present supply any parts. Every fact that can be expressed in the
2NF schema can also be expressed in the 3NF scheme, but not every fact
in the 3NF scheme can be expressed in the 2NF schema.
>
> Contrary to popular belief, adding a surrogate key to each table does not
> change the level of normalization of a schema. Candidate keys remain
> candidate keys after a surrogate has been added and declared the primary
> key. I'm not sure, but I think you're asserting the opposite above.
I don't think this rings true either. While it is true that candidate
keys remain candidate keys after a surrogate has been added, I think
the level of normalization is affected. There may not be a name for a
normal form in which no relation that has more than one key has any
dependent attributes, but it would certainly be beneficial for a
number of reasons. First of all, there could be no nontrivial
transitive dependencies, where a trivial transitive dependency is one
in which the determinant is transitively dependent on itself. Also,
there can only ever be at most one irreducible determinant for each
non-prime attribute. Moreover, by isolating the interrelationships
between keys to their own relations, it makes it possible for just one
of those keys to be used for referential integrity, simplifying the
graph of inclusion dependencies.
>
> How do you know that there are two sets of cars owned, with each set owned
> by a different person, instead of one set consisting of two cars and with
> one owner? Is that fact expressed in the database? How?
If you add a surrogate, then the fact that the surrogates are
different is sufficient to determine that there is more than one
person.
That's the basic idea.
>
> Also, do views/derived relvars, eg., joins and unions, have complements
> that could theoretically be recorded?
The complement of a union is exactly those tuples that can
theoretically appear in either operand but don't; the complement of a
join is exactly those tuples that can theoretically appear in the
result but don't.
This could be extremely misleading to the casual reader. In one breath,
it suggests a set-piece that recasts relations under the guise of
normalization but ignoring constraints at the same time. In fact, when
normalizing, one must nearly always introduce constraints in order to
preserve "semantics". Most trade writers make the same mistake which
may be why so many people think re-design involves only normalization.
Why put it differently for union as compared to join? Surely the
complement of a union is exactly those tuples that can satisfy the
result but don't. I would say that way of putting it makes it more
obvious which predicates can be satisfied by various tuples and which
predicates can't, likewise which propositions are possible and not possible.
I don't think it is misleading. It does not suggest anything: it
states fact. The example supports that fact. In fact, the 3NF scheme
in the example includes an inclusion dependency, a necessary
consequence of the transitive dependency in the 2NF schema.
>> Normalization has very little to do with the semantics of the data. �If
>> you'll look carefully at all the examples of converting from one normal form
>> to the next higher form, you'll see that the schema before conversion and
>> the one after conversion are equivalent in their capacity to express facts.
>> That's important.
>
>And it's wrong! Normalization has everything to do with the semantics
>of the data. [...]
[...]
>In the 3NF scheme, it is possible for the fact that a supplier has a
>particular phone number to be expressed even if the supplier doesn't
>at present supply any parts. Every fact that can be expressed in the
>2NF schema can also be expressed in the 3NF scheme, but not every fact
>in the 3NF scheme can be expressed in the 2NF schema.
This is because you're dropping an inclusion dependency
when going to the 3NF version.
More generally, the reason you're disagreeing is that people seem to
disagree what exactly it means to normalize or denormalize: they agree
on what happens to the table structure, but not on what happens to
constraints. E.g. the Wikipedia articles on this subject are
sorely lacking on this matter.
>> Contrary to popular belief, adding a surrogate key to each table does not
>> change the level of normalization of a schema. �Candidate keys remain
>> candidate keys after a surrogate has been added and declared the primary
>> key. �I'm not sure, but I think you're asserting the opposite above.
[...]
>> How do you know that there are two sets of cars owned, �with each set owned
>> by a different person, �instead of one set consisting of two cars and with
>> one owner? �Is that fact expressed in the database? �How?
>
>If you add a surrogate, then the fact that the surrogates are
>different is sufficient to determine that there is more than one
>person.
Tuples don't represent objects, but statements of fact.
Different statements about persons may need to be recorded in situations
where it is not known whether they pertain to the same person.
E.g. in some kind of observation database (say, incident reporting
on traffic) this may be necessary. Doing so doesn't violate
the closed world assumption.
--
Reinier
The 3NF scheme has an inclusion dependency. It is the only one that
is a logical consequence of the transitive functional dependency. The
inclusion dependency ensures that the transitive functional dependency
still holds in the join of the 3NF relations.
> More generally, the reason you're disagreeing is that people seem to
> disagree what exactly it means to normalize or denormalize: they agree
> on what happens to the table structure, but not on what happens to
> constraints. E.g. the Wikipedia articles on this subject are
> sorely lacking on this matter.
>
I can definitely agree with that.
> >> Contrary to popular belief, adding a surrogate key to each table does not
> >> change the level of normalization of a schema. Candidate keys remain
> >> candidate keys after a surrogate has been added and declared the primary
> >> key. I'm not sure, but I think you're asserting the opposite above.
>
> [...]
>
> >> How do you know that there are two sets of cars owned, with each set owned
> >> by a different person, instead of one set consisting of two cars and with
> >> one owner? Is that fact expressed in the database? How?
>
> >If you add a surrogate, then the fact that the surrogates are
> >different is sufficient to determine that there is more than one
> >person.
>
> Tuples don't represent objects, but statements of fact.
Tuples don't /directly/ represent objects, but given a ground atom,
Pabc, what are the symbols a, b and c? They're individual constants,
and under an interpetation they represent objects. Moreover, since
every relation has a key, the ground atoms represented really must be
of a form similar to, Q(f(a,b),c), where f is a function that maps
combinations of objects into the universe of discourse. Under an
interpretation, the function application, f(a,b) becomes the object
mapped so that the formula Q(f(a,b),c) states information about that
particular object. Keys, therefore, render your assertion
oversimplistic, since every instance of a key maps to an object in the
universe of discourse. Even in a relation that has more than one key,
there is always a superkey that includes all prime attributes and
represents the aggregate of the objects represented by the key
instances, which is also an object.
> Different statements about persons may need to be recorded in situations
> where it is not known whether they pertain to the same person.
> E.g. in some kind of observation database (say, incident reporting
> on traffic) this may be necessary. Doing so doesn't violate
> the closed world assumption.
>
I don't know what you're getting at here.
> --
> Reinier
>On Sep 22, 5:27�pm, r...@raampje.lan (Reinier Post) wrote:
>> Brian wrote:
>> >You're wrong, of course, but don't take my word for it. �According to
>> >Date in /An Introduction to Database Systems, Eighth Edition/, page
>> >161: 'the Closed World Assumption (also known as the Closed World
>> >Interpretation) says that if an otherwise valid tuple--that is, one
>> >that conforms to the relvar heading--does /not/ appear in the body of
>> >the relvar, then we can assume the corresponding proposition is
>> >false.
>>
>> But what *is* that proposition? �It might be
>>
>> � FIRTNAME LASTNAME is an employee at CORPORATION
>>
>> but it might just as well be
>>
>> � at some time in the past, it has been asserted that
>> � FIRTNAME LASTNAME is an employee at CORPORATION
>>
>> which is a closed world formulation of what is approximately
>> the open world counterpart of the first.
>
>I don't think it is. Assuming that the assertion was true at that
>time in the past, the proposition is temporally qualified, whereas the
>first isn't.
Under the first interpretation, the closed world assumption allows us to
deduce that, if (Sally, Smith, Acme) is not in the table, Sally Smith
is no Acme employee, while under the second interpretation, we don't
know whether she is. So this 'closed-world assumption' doesn't actually
limit our ability to leave the truth values of propositions undecided
by what is the database. We just have to slightly modify which
proposition is being expressed.
So the closed world assumption doesn't limit what can be stated
about the world with base relations; it is only important when looking
at logical inference, e.g. determining the interpretation of complex
constraints.
--
Reinier
Under the closed world intepretation, every formula that can be
represented in a table is assigned a truth value--positive for those
that are actually represented in the table and negative for those that
aren't, but under the open world interpretation, only those that are
actually represented are assigned truth values. Let's put it another
way: either it is supposed to be true or it is known to be true.
Under the closed world interpretation, what is represented is supposed
to be true, but under the open world interpretation, what is
represented is known to be true. Bottom line: it would be pointless
to suppose that what is represented is known to be true.
>
> So the closed world assumption doesn't limit what can be stated
> about the world with base relations; it is only important when looking
> at logical inference, e.g. determining the interpretation of complex
> constraints.
>
> --
> Reinier- Hide quoted text -
>
> - Show quoted text -
Consider two external predicates p1, p2 (where "external" means they
are informally described in natural language) satisfying
for all X, p1(X) --> p2(X)
E.g.
p1(X) :- X is a frog currently on display in
the San Diego zoo
p2(X) :- X is an amphibian currently on display
in the San Diego zoo
A relvar recording p1 under the CWA can also be regarded as recording
p2 under the OWA.
Putting it another way, it is often the case that by "narrowing" the
external predicate one can turn an OWA into a CWA.
E.g. there may be a relvar for which the following gives an OWA:
p2(X) :- X is currently an employee of Acme Co
whereas the following gives a CWA:
p1(X) :- It is known to the HR department that X is currently
an employee of Acme Co.
Not a good idea. p1 implies p2 but p2 does not imply p1. Consider:
p3(X) :- X is a salamander currently on display
in the San Diego zoo
It would be best to represent p1, p2 and p3 as separate relation
schemata. The implicative relationships from p1 to p2 and from p3 to
p2 would then be best represented as inclusion dependencies.
> Putting it another way, it is often the case that by "narrowing" the
> external predicate one can turn an OWA into a CWA.
>
> E.g. there may be a relvar for which the following gives an OWA:
>
> p2(X) :- X is currently an employee of Acme Co
>
> whereas the following gives a CWA:
>
> p1(X) :- It is known to the HR department that X is currently
> an employee of Acme Co.
In this example, p1 is closed with respect to what is known to be
true, not what is supposed to be true. The open world interpretation
therefore still applies with respect to whether X is currently an
employee of Acme Co. It is possible for there to be an employee that
the HR department doesn't yet know about. (Catbert may be taking a
day off.) As a consequence, it would be incorrect to assume that the
only employees that there are are those that are known to the HR
department. The limitations associated with applying the open world
interpretation do not disappear when a relation is closed with respect
to what is known.
That's only because the OWA sucks. There is no information whatsoever
about why data is missing. In this case under OWA we aren't told we
have only recorded the amphibians that happen to be frogs.
> p1 implies p2 but p2 does not imply p1.
That's exactly what lies behind the distinction between OWA and CWA.
> Consider:
>
> p3(X) :- X is a salamander currently on display
> in the San Diego zoo
>
> It would be best to represent p1, p2 and p3 as separate relation
> schemata. The implicative relationships from p1 to p2 and from p3 to
> p2 would then be best represented as inclusion dependencies.
What's best depends on the requirements.
> > Putting it another way, it is often the case that by "narrowing" the
> > external predicate one can turn an OWA into a CWA.
>
> > E.g. there may be a relvar for which the following gives an OWA:
>
> > p2(X) :- X is currently an employee of Acme Co
>
> > whereas the following gives a CWA:
>
> > p1(X) :- It is known to the HR department that X is currently
> > an employee of Acme Co.
>
> In this example, p1 is closed with respect to what is known to be
> true, not what is supposed to be true.
Actually p1 designates an *external* predicate. It is meaningless to
ask whether p1 is open/closed. External predicates are defined
independently of databases.
It is a *relvar* within a database which is open/closed with respect
to some external predicate.
When you say "what is known to be true" you make it sound like that
has some absolute, universal significance. But consider:
p0(X) :- It is known by Fred that it is known to the
HR department that X is currently an employee
of Acme Co.
Fred could create a database with a relvar that is closed with respect
to p0 and open with respect to p1.
>Under the closed world intepretation, every formula that can be
>represented in a table is assigned a truth value--positive for those
>that are actually represented in the table and negative for those that
>aren't, but under the open world interpretation, only those that are
>actually represented are assigned truth values. Let's put it another
>way: either it is supposed to be true or it is known to be true.
>Under the closed world interpretation, what is represented is supposed
>to be true, but under the open world interpretation, what is
>represented is known to be true. Bottom line: it would be pointless
>to suppose that what is represented is known to be true.
I can't link this to the notion of closed world assumption
I'm familiar with. It doesn't make sense to me.
--
Reinier
Maybe you should revisit it, then. For a given predicate and a finite
domain there is a finite set of valid propositions, but not every
valid proposition is a true proposition. It is only under an
interpretation that those propositions are assigned a truth value.
Under the closed world interpretation, only and all true propositions
are represented as tuples in the relation; under the open world
interpretation, only but not necessarily all true propositions are
represented as tuples in the relation. In other words, under the
closed world interpretation, what is represented is supposed to be
true, but under the open world interpretation, what is represented is
only what is known to be true.
>
> --
> Reinier
No. It isn't. A closed-world database consisting of two relations
with predicates p1 and p2 and an inclusion dependency from the
relation with predicate p1 to the relation with predicate p2 satisfies
the constraint that p1 implies p2 but p2 does not imply p1. The
constraints defined on a database determines which tuples or
combinations of tuples can appear in the database at any given time--
independent of whether the CWA or the OWA is applied. The distinction
between the CWA and the OWA involves just whether or not the absence
of one or more of those tuples is significant.
>
> > Consider:
>
> > p3(X) :- X is a salamander currently on display
> > in the San Diego zoo
>
> > It would be best to represent p1, p2 and p3 as separate relation
> > schemata. The implicative relationships from p1 to p2 and from p3 to
> > p2 would then be best represented as inclusion dependencies.
>
> What's best depends on the requirements.
Maybe I should have used 'it would be better...' instead of 'it would
be best....'
>
> > > Putting it another way, it is often the case that by "narrowing" the
> > > external predicate one can turn an OWA into a CWA.
>
> > > E.g. there may be a relvar for which the following gives an OWA:
>
> > > p2(X) :- X is currently an employee of Acme Co
>
> > > whereas the following gives a CWA:
>
> > > p1(X) :- It is known to the HR department that X is currently
> > > an employee of Acme Co.
>
> > In this example, p1 is closed with respect to what is known to be
> > true, not what is supposed to be true.
>
> Actually p1 designates an *external* predicate. It is meaningless to
> ask whether p1 is open/closed. External predicates are defined
> independently of databases.
>
> It is a *relvar* within a database which is open/closed with respect
> to some external predicate.
>
Not sure what you mean here.
> When you say "what is known to be true" you make it sound like that
> has some absolute, universal significance. But consider:
>
> p0(X) :- It is known by Fred that it is known to the
> HR department that X is currently an employee
> of Acme Co.
>
> Fred could create a database with a relvar that is closed with respect
> to p0 and open with respect to p1.- Hide quoted text -
>
> - Show quoted text -
I find it hard to believe that you can't recognize the difference
between what is supposed to be true and what is known to be true.
That's not the situation I described. I said there was a *single*
relvar closed with respect to p1 and open with respect to p2.
> > > Consider:
>
> > > p3(X) :- X is a salamander currently on display
> > > in the San Diego zoo
>
> > > It would be best to represent p1, p2 and p3 as separate relation
> > > schemata. The implicative relationships from p1 to p2 and from p3 to
> > > p2 would then be best represented as inclusion dependencies.
>
> > What's best depends on the requirements.
>
> Maybe I should have used 'it would be better...' instead of 'it would
> be best....'
>
>
>
>
>
>
>
> > > > Putting it another way, it is often the case that by "narrowing" the
> > > > external predicate one can turn an OWA into a CWA.
>
> > > > E.g. there may be a relvar for which the following gives an OWA:
>
> > > > p2(X) :- X is currently an employee of Acme Co
>
> > > > whereas the following gives a CWA:
>
> > > > p1(X) :- It is known to the HR department that X is currently
> > > > an employee of Acme Co.
>
> > > In this example, p1 is closed with respect to what is known to be
> > > true, not what is supposed to be true.
>
> > Actually p1 designates an *external* predicate. It is meaningless to
> > ask whether p1 is open/closed. External predicates are defined
> > independently of databases.
>
> > It is a *relvar* within a database which is open/closed with respect
> > to some external predicate.
>
> Not sure what you mean here.
External predicates are informal and relate to some perceived notion
of reality. They are defined without reference to any recorded
relation values in any database. Therefore it is nonsensical to say
whether external predicates are open or closed.
It is only meaningful to ask whether a recorded relation value in a
database is assumed to be open/closed with respect to some external
predicate.
> > When you say "what is known to be true" you make it sound like that
> > has some absolute, universal significance. But consider:
>
> > p0(X) :- It is known by Fred that it is known to the
> > HR department that X is currently an employee
> > of Acme Co.
>
> > Fred could create a database with a relvar that is closed with respect
> > to p0 and open with respect to p1
>
>On Oct 20, 3:18�pm, rp...@pcwin518.campus.tue.nl (rpost) wrote:
>> Brian wrote:
>> >Under the closed world intepretation, every formula that can be
>> >represented in a table is assigned a truth value--positive for those
>> >that are actually represented in the table and negative for those that
>> >aren't, but under the open world interpretation, only those that are
>> >actually represented are assigned truth values.
Yes. Thus far I agree.
>> Let's put it another
>> >way: either it is supposed to be true or it is known to be true.
??
>> >Under the closed world interpretation, what is represented is supposed
>> >to be true, but under the open world interpretation, what is
>> >represented is known to be true.
??? No. Under both representations, the tuples in a relation represent
statements supposed to be true. The difference regards only the tuples
*not* in the relation: under the CWA, these correspond to statements
supposed to be false, while under the OWA they may just as well be false.
>> >�Bottom line: it would be pointless
>> >to suppose that what is represented is known to be true.
I have no idea what you mean to say here.
>Under the closed world interpretation, only and all true propositions
>are represented as tuples in the relation; under the open world
>interpretation, only but not necessarily all true propositions are
>represented as tuples in the relation.
Exactly.
>In other words, under the
>closed world interpretation, what is represented is supposed to be
>true, but under the open world interpretation, what is represented is
>only what is known to be true.
There is a deep misunderstanding here. I can't figure out what it is.
--
Reinier
>??? No. Under both representations, the tuples in a relation represent
>statements supposed to be true. The difference regards only the tuples
>*not* in the relation: under the CWA, these correspond to statements
>supposed to be false, while under the OWA they may just as well be false.
true
Darn. Double negation :(
--
Reinier
Under the open world interpretation, tuples that can be in a relation
but aren't represent propositions that may or may not be true. In
other words, it is unknown whether those propositions are true or
false, but for the tuples that are in a relation, it is not unknown
whether the propositions represented are true because they are in fact
supposed to be true. It follows, therefore, since it is not unknown
whether the propositions represented are true, that they are known to
be true.
I think the surrogate key has no significance for semantics. If
surrogate values are never displayed to database users, then even upon
careful observation of the surrogates, I cannot notice an atom of
semantics there.
Regarding “blank nodes” in the RDF data model, they merely indicate
the existence of the thing, but not its name. But, a name is a really
very important in semantics. A name is the glue that binds a thing in
our mind to the corresponding thing in the real world.
In the further development of computers, probably in a future semantic
machine, solutions like surrogate keys have no relevance.
Vladimir Odrljin
>Under the open world interpretation, tuples that can be in a relation
>but aren't represent propositions that may or may not be true. In
>other words, it is unknown whether those propositions are true or
>false, but for the tuples that are in a relation, it is not unknown
>whether the propositions represented are true because they are in fact
>supposed to be true.
They are supposed to be true, not known to be true.
A database relation cannot be guaranteed to express facts,
It expresses statements of fact, given an interpretation as a predicate.
Whether this interpretation follows CWA or not doesn't make a difference.
>It follows, therefore, since it is not unknown
>whether the propositions represented are true, that they are known to
>be true.
But this is no different from the tuples in a relation interpreted
under the closed world assumption.
>> >In other words, under the
>> >closed world interpretation, what is represented is supposed to be
>> >true, but under the open world interpretation, what is represented is
>> >only what is known to be true.
I still don't understand what difference between 'supposed' and 'known'
you have in mind here.
--
Reinier
Under the closed world interpretation, there are no unknown truth
values, but under the open world interpretation, only what has been
explicitly asserted is known to be true, and it is known to be true
even if the user that made the assertion is mistaken. One has to
assume--especially if the identity of the user is not also being
recorded--that what the users assert is true to the best of their
knowledge. (If the identity of the user /is/ being recorded, then
what is being recorded is not just known or suppposed to be true: it
actually is true--even if the user is lying.) Just because something
is known to be true doesn't mean that it actually is true. Even under
the closed world interpretation, what is supposed to be true may not
actually be true.
>Under the closed world interpretation, there are no unknown truth
>values, but under the open world interpretation, only what has been
>explicitly asserted is known to be true, and it is known to be true
Yes ...
>even if the user that made the assertion is mistaken.
Huh?! So if I have a database relation 'X works at Y', with the open
world assumption, if someone updates the relation to say that Jane Doe
works at Acme Corp., then I must assume this to be true even if that
person is mistaken? How does that make any sense?
>[...] Just because something
>is known to be true doesn't mean that it actually is true. Even under
>the closed world interpretation, what is supposed to be true may not
>actually be true.
Exactly. There is no difference between OWA and CWA in this respect,
regarding the assertions that correspond to tuples in the relation.
The difference is in what they imply for assertions that correspond
to tuples not in the relatoion.
--
Reinier
Yes. If you don't also record who uttered the assertion, then the
supposition implicit in its representation in the database under the
closed world interpretation is that the assertion is true or under the
open world interpretation is that whoever uttered the assertion knows
that it is true. This makes sense because it is not known that the
person is mistaken. If a bank teller counts his drawer at the end of
the day and is neither over nor short, then the assumption must be
that every customer received correct change, but it could be that one
customer received a dollar more than he should have and another
received a dollar less.
>On Nov 8, 6:37�pm, r...@raampje.lan (Reinier Post) wrote:
>> Brian wrote:
>> >Under the closed world interpretation, there are no unknown truth
>> >values, but under the open world interpretation, only what has been
>> >explicitly asserted is known to be true, and it is known to be true
>>
>> Yes ...
>>
>> >even if the user that made the assertion is mistaken.
>>
>> Huh?! �So if I have a database relation 'X works at Y', with the open
>> world assumption, if someone updates the relation to say that Jane Doe
>> works at Acme Corp., then I must assume this to be true even if that
>> person is mistaken? �How does that make any sense?
>
>Yes. If you don't also record who uttered the assertion, then the
>supposition implicit in its representation in the database under the
>closed world interpretation is that the assertion is true or under the
>open world interpretation is that whoever uttered the assertion knows
>that it is true. This makes sense because it is not known that the
>person is mistaken. If a bank teller counts his drawer at the end of
>the day and is neither over nor short, then the assumption must be
>that every customer received correct change, but it could be that one
>customer received a dollar more than he should have and another
>received a dollar less.
This is all very clear. What puzzles me is your use of the word 'known'.
But I don't seem to make any progress trying to understand it
so let's drop it.
--
Reinier