Google Groups

Re: some information about anchor modeling

vldm10 Mar 14, 2013 6:21 AM
Posted in group: comp.databases.theory
Dana ponedjeljak, 11. veljače 2013. 08:41:14 UTC+1, korisnik Derek Asirvadem napisao je:

> (Again, the question begs: what is the exact difference between your "identifiers" and surrogates?)

1.  I would like to mention that this part which is about the surrogate key is not important, it's maybe 5% of the problem. Decomposition of a structure into binary structures is what is important here. Atomic structures imply many important consequences. Some of these important consequences are atomic semantics and atomic sentences.
E. Codd and Date & Darwen, is a group that has been extensively and unsuccessfully worked on the problem of decomposition of structures into binary structures.
The decomposition of structures in the binary structures, which is shown in the work of the RM / T is not correct because it is only valid for Simple databases. These are databases which maintain only current state. RM / T is not correct for databases that maintain history of events. RM / T also does not hold for General database theory. ( The basic ideas of General database theory, I've outlined in this thread in my post of 13 February 2013. )
In my papers it is shown how to construct the aforementioned decomposition for the following databases: Simple databases, General databases, databases that implement RM, databases that implement ER Model and databases that implement file systems. It is also shown how to do the mapping between these data models.

Codd is introduced surrogates only for one reason. By using the surrogates he tried to get binary (atomic) structures.

2.  RM / T "solution" is not supported with theory. For example, in the RM / T paper, binary decomposition is not proven at the level of ER Model. It was not proven in RM/T paper that this decomposition into the binary structures is valid in the ER model. Codd introduced the E and P atomic structures, directly into the ERM without any evidence, because he needed the binary structures. I would say that he desperately needs the binary decomposition because this decomposition means "A and Z" of RM/T.  Then Codd immediately proclaimed the binary structures of ERM as binary relations. Thus he introduced the E and P relations. By the way this is not science; even more, in my opinion, this is not a “discipline” as C. Date presents RM/T.
If one wants to transfer the binary structure from ERM in RM (and also do inverse mapping), then it has to do with a theory about the mapping between the two data models. As I already wrote about it, Codd did not notice it at all.

3.  The RM / T, section 4, Codd wrote: "There are three Difficulties in employing user-controlled keys as permanent surrogates for entities."
My comment: The surrogates are not user-controlled by definition.

Also in RM / T, section 4, Codd wrote: "Introduction of the E-domain, E-attributes and surrogates does not make a user controlled keys obsolete. Users will often need entity identifiers (such as part serial numbers) that are totally under their control, although they are no longer compelled to invent a user-controlled key if they do not whish to. "
My comment: the surrogates are not user-controlled by definition.  

4.  It is not clear what the surrogate key is. Does the surrogate key is a part of a theory or is a technical solution?  Does the surrogate key is a part of an entity? Codd also need to explain why leaves relational key.
It is not clear what Codd modeled by using a surrogate key? Note that RM/T is data model. What RM/T describes and explains?  Does "RM / T” model real things?

States of the entities are modeled in my solution, for example. The entities and relationships are from the real world.
I also introduced abstract objects, which I defined precisely. In working with abstract objects, the main tool is the identification. How the memory of the database operates with abstract objects and how memory identifies abstract objects is an important part in my paper. So, in my theory about the identification, identifiers (labels, tokens) have important roles, especially in memory.
5.  What is done by Codd in his paper "RM / T"? He was aware that the binary relation must have a simple key. If the key is composite then there is no point to create a binary relation, which has a key with many attributes. So, Codd introduced a simple key. He understood that surrogate keys can cause problems. Therefore, he has introduced some additional things - he proposed the surrogate key should be invisible to the user.
He was also aware that if someone solves this decomposition into binary structures, then these binary structures must have a simple key and one attribute. So any new solution must be plagiarism. Obviously this approach is very useful and cheap.

There is another approach to this problem. It is 6nf. However 6nf has one big minus. Authors of 6nf not give a procedure that places a relvar in 6nf. Note that the authors of 6nf here actually trying to solve the aforementioned binary (atomic) decomposition.

At this User Group, I gave an example of the relation whose attributes are mutually independent. This example shows that 6nf is absurd, because there are relations who are in 6nf, but these relations have keys that are composed of a large number of attributes. Theoretically the number of these attributes can be any finite number.

6.  Construction of the surrogate key.

The RM / T relation has a primary key, which is usually a composite key. A primary key is in the database and has the corresponding attributes in the real world.
In addition to this primary key, Codd has added another primary key, which is only in the database, it is a surrogate key. In this way, there are two primary keys on the database level and the corresponding attributes are at the third level, that is, at the level of the corresponding real entity.
As I already wrote, Codd is introduced surrogates only for one reason. By using the surrogates he tried to get binary (atomic) structures.

Can you imagine how easily an experienced programmer can create chaos if he intentionally changing the value of a key in one of these aforementioned three levels?

To make matters better the surrogates are used only as part of binary structures.
So, to escape work with the two parallel relational databases, Codd uses only binary relations. But now new problems arise. These problems are caused by the surrogate key. I will mention the following two such problems:
(i)   How to identify an entity in the real world using a surrogate key from a database?  Note that the surrogate key does not exist in the real world. We note also that in RM / T, we work only with binary relations.
(ii)  Two distinct entities can have the same state. This impies the following problem:  How to find the two corresponding real world objects. Note that a key in the RM is on the level of relation and a key in RM/T is on the level of a database. The same problem exist in OOA. Note that the surrogate key is similar to OO identifier.

7.  In my solution, the real entity has the (real) identifier. This identifier also exists in the corresponding database structure. So this identifier is not a surrogate key, even more it is not a key in my solution.
Note that my structure has the identifier of state as primary key. The identifier of state is different from identifier of an entity.

8.  Note that Codd does not understand the difference between the surrogate key and externally verifiable key. See 3, above. There are two types of externally verifiable keys. The first is the global industry-standard externally verifiable keys. Another type is not as the global industry-standard; rather it is the local-standard externally verifiable keys.
The local standard is very important because it gives a great opportunity for good design.
One example is employee number at some company. A company can maintain a list of the employee numbers on the paper at Human Resources, or a company can maintain a documentation or a company can put it on the web with limited access, etc. Thus all departments can use these numbers. It is important that a user can find and verify the identity in the real world, it is not necessary that the employee numbers be part of the real employee. Off course a company can associate badges to employees, if they want it.
Another example about local standard is addresses. Different countries have different systems of addresses, but an address can externally be verified.

It is very important to understand that each company may have its own system of identification and technology, which can be public or private and it can be externally verifiable. The Local standard is an example of the good design. Of course, general industry-standard keys and local-standard keys are not surrogates. Note that there is a profound difference between the externally verifiable keys and surrogates keys.  Surrogates are on the level of the link “db designer–database”, while externally verifiable keys are on the level of the link "users-database".
The surrogates are related to the memory manipulation, while externally verifiable keys are related to the transfer of the semantic and logical content. It seems to me that Codd did not understand the nature of this problem?

9.   The following paper was selected as the best paper of ER’09 :
Anchor Modeling - An Agile Modeling Technique Using the Sixth Normal Form for Structurally and Temporally Evolving Data.
This paper, among others, has three important parts: the surrogate key from the RM / T, 6nf, and the main results of my paper published in the 2005th.  In this post I showed that the RM / T is not correct, i.e. its binary decomposition is not correct. RM/T was not resolved problems in the domain of the general theory of databases. In contrast to RM/T paper, these problems are solved in my papers.
I have also shown that 6nf has no scientific value. Paper "Anchor Modeling" promotes the RM / T and 6nf, in conjunction with the most important results in database theory, although these works are incorrect and inaccurate concerning important parts of the general theory of databases. So among these three important parts of "Anchor Modeling" only accurate part is the one whose results I published four years before the paper "Anchor Modeling."
One of the main purposes of normalization in RM is to avoid redundancy. In contrast to RM my model keeps all redundancy. This shows that the difference between my model and the RM is complete. There are other differences between my model and RM, which indicates that these two models are the opposite. Therefore, the aforementioned "bridging" between RM and my model is not correct.

10. I wrote such a long post primarily to protect my work. But also this post was written, because people debating for years on this subject and in my opinion spend a great time on things that are not clearly presented.
Of course, everyone is entitled to their opinion.

Vladimir Odrljin