In this thread I will write about the major steps in the design of a database. As I
wrote the authors of "Anchor modeling" claim that "In this paper we propose a
modeling technique for data werehousing, called anchor modeling, ..."
The main anchor modeling concept is introduced in the following two definitions (look
at page 3):
Def 1 (Identities). Let ID be an infinite set of symbols, which are used as
identities.
Def 2 (Anchor). An anchor A (C) is atable with one column. The domain of C is ID. The
domain of C is ID. The primary key for A is C.
--
Note that the term "Identities" is not defined. Note that this term is undefined in
philosophy also. I want to emphasize that they initially start from the undefined
term.
I will try in this post, present a real database design. Note that I am using the
following terms: "identification" and identifier. In my opinion this is a big
difference from "identities".
1. Step
I adopt the following Godel view of the world: “By the theory of simple types I mean
the doctrine which says that the objects of thought (or, in another interpretation,
the symbolic expressions) are divided into types, namely: individuals, properties of
individuals, relations between individuals, properties of such relations, etc...
(Kurt Gödel 1944)“
In relation to this Godel's view I would like to say the following: I found this
Godel's text about 2-3 years ago. Until then, I used the Entity / Relationship Model,
which is applied in Peter Chen's data model. Obviously this Godel's definition, has a
priority of ideas. I also believe that many philosophers and mathematicians have
worked on such a view of the world before Godel and I believe that some of them made
significant contributions to this theory.
The term "object of thought" is important here, and it seems to me that in this small
text, we can see a strong mindset of K. Godel. In my paper from year 2008, I
introduced the concepts of m-attributes, m-entities, m-relationships and m-m-states,
with the prefix "m". Prefix "m" I have used, because these objects are in memory.
These my terms, only in part are "objects of thought". I also do not believe in the
term "type". In my paper „Semantic databases and semantic machines“ (at
http://www.dbdesign11.com) In section 1, I defined abstract objects. In section 2, I
defined facts. Facts represent elementary (or atomic) thoughts that correspond to
atomic data structures. In section 3, I introduced factual sentences. Factual
sentences express facts. Section 4 is about awareness.
Now I will not explain these ideas in detail, because it is a very broad topic, I'll
just mention some of my thoughts and solutions related to them.
I'm building facts from the atomic structures, facts are based on the atomic
structures. Here you can see immediately a great advantage of atomic structures that
are obtained by applying my theory about the states. If you try to use the "normal
form" for an entity, then you should take all normal forms, to reach 6NF for which
there is no procedure and which does not operate in a number of cases, which I
described in this thread. One more thing is important here. If you want to apply
"normal forms", then you must have the wrong data-structute, because "normal forms"
repair only those data-structures that are wrong. In order to formalize the work with
atomic structures, we need a tool that directly constructs these atomic structures.
So, the basic idea here, related to facts, is to try to formalize work with thoughts.
I think this is a fine place to quote G. Frege: „I am not here in the happy position
of a mineralogist who shows his audience a rock-crystal: I cannot put a thought in
the hands of my readers with the request that they should examine it from all sides.
Something in itself not perceptible by sense, the thought is presented to the reader
– and I must be content with that – wrapped up in a perceptible linguistic form. The
pictorial aspect of language presents difficulties. The sensible always breaks in and
makes expressions pictorial and so improper. So one fights against language, and I am
compelled to occupy myself with language although it is not my proper concern here. I
hope I have succeeded in making clear to my readers what I want to call 'thought'
...“
So, in fact, Frege studied thoughts. He found a tool with which it is able to work
with the thoughts. This tool is language. Note that G. Frege discovered a major part
in propositional logic. He completely discovered predicate calculus and Semantics,
starting from scratch, 120 years ago.
It seems to me that this Frege's thinking has a lot to do with the following Godel's
„objects of thought“ and „the symbolic expressions“ in the above-mentioned sentence:
„I mean the doctrine which says that the objects of thought (or, in another
interpretation, the symbolic expressions).“ So, in fact, this is about entities(or
individuals in Godel notation)
2. Step
We identify entities by using Leibniz's Law of the identity of indiscernibles (the
indiscernibility of identicals)
Another important thing here is the following: Leibniz's Law allows that the
identification of the object is a mathematical discipline. So Leibniz's Law is the
mathematical tool.
Things become accurate because we realize Leibniz's Law in ERM. So we can only
identify entities and relationships. Many system-analysts think that each entity is
determined by its attributes, ie that it is determined with its intrinsic attributes.
This is not true.
I've divided Leibniz's Law in two laws. Leibniz's Law that uses intrinsic attributes
and General Law which uses intrinsic + entrinsic attributes (see may paper „Semantic
databases and semantic machines“ section 5.5 i 5.6. This division is realized within
the Entity / Relationship model about the world.
I will repeat once again the following example, because it is important in this part
of the theory. It argues against the claim that intrinsic attribtes determine the
corresponding entity.
Example1: Honda dealer received 200 new Honda Civic cars, which all have the same
attributes. Imagine now that someone has wiped out all the VIN numbers from these
Honda Civic. Then we get 200 cars that have all the attributes the same. If in this
situation we apply surrogates, then we will get a disaster. If we keep the industry-
standard identifiers, then we do not need surrogates. Note that this problem with a
surrogate key, there is for all industrial products of this type.
So, we have to say on which basis we give the VIN to each of these cars. We affirm
the uniqueness of an entity by using the General Law. Note that in this case when we
apply the General Law, then the newly introduced identifier of the entity becomes
intrinsic attribute of the corresponding entity.
3. Step
This step is about identification.
Change of identity is allowed in some countries. Note that in "anchor modeling" is
banned changes in the identities of entities.
1.
In my model, identification is defined recursively. In my first paper from 2005, I am
able to write all the keys in the form of simple identifiers (not composed). I have
identifiers of entities, relattionships and states. Look at my website
http://www.dbdesign10.com , section 1 and 2.
In section 4 was introduced Simple Form. This form gives the conditions for
decomposition of data structures in the atomic structures, for db that maintain
current states. In this case, the binary structure consists of the simple key and one
attribute. Simple Form fully describes the identifiers of entities, ie surrogate
keys, locally defined keys and internationally defined key. To my knowledge, this is
the first work that fully describes the surrogate keys, locally defined keys and
internationally defined keys, and the conditions under which they may be designed as
a simple keys.
2.
In section 1 my paper in 2005, is part of the text, which I did mention several
times. In this text, there are several important things for my data model. Here is
the text:
„We determine the Conceptual Model so that every entity and every relationship has
only one attribute, all of whose values are distinct. So this attribute doesn’t have
two of the same values. We will call this attribute the Identifier of the state of an
entity or relationship. We will denote this attribute by the symbolAck. All other
attributes can have values which are the same for some different members of an entity
set or a relationship set. Besides Ack, every entity has an attribute which is the
Identifier of the entity or can provide identification of the entity. This identifier
has one value for all the states of one entity or relationship...“
In this article I would like to analyze the following sentence: "or can provide
identification of the entity." Thus, anything that can provide identification of an
entity, "directly" or "indirectly". Let's call this rule "Identification OfEntity".
To highlight the importance of this rule I will mention now three important examples
from practice that are "IdentificationOfEntity" realized "indirectly".
(i) Surrogate key. Here, using "surrogate key" we identify the corresponding
attributes that are in the database (ie in the memory). Based on these m-attributes,
we find the corresponding real world attributes on the corresponding real world
object.
(ii) Application of the General Leibniz's Law. Here I use that "intrinsic +
extrinsic" properties "can provide identification of the entity".
(iii) I am most interested in this case. In my opinion, the human memory is not
operating as a db memory, that is, it does not use keys. For example, someone can
recall of certain person, based on one date. For example, someone notices a date and
he recall himself that on that date, certain person died. Thus, in the this example,
we identify an entity based on the date (but not by using a key). In my post on 5
May, 2015, in this thread, I presented the example, where I showed how to find a
person if we know his date of birth. This is realized by using the good organization
of the data, ie by using the atomic data structures.
In the above mentioned article, there is another important sentence: „This
identifier has one value for all the states of one entity or relationship...“ I named
it as „Procedure A“. I wrote about it in my thread „The original version“ in my post
from 26. May, 2010. and in that post, I've marked it as (a).
This very important part of the design of the General databases for the first time
solves a set of important things is plagiarized from the authors of "Anchor
Modeling" and named it anchor and immutable key. Now, there are names such as the
"immutable objects", which is an example of not understanding the essence, at the
level of db design. In this thread, I wrote that in the theory of object-oriented
languages, began to appear the term "immutable key" and that this term is wrong term.
In my thread "some information about anchor modeling," in my post from 18 July,
2012, I wrote that the surrogate key is a weak point in the oop and oo db. I wrote
that these problems in OOP and OODB can be solved just by using the mentioned
"procedure (a)."
3.
In my paper from 2008, "Database design and data model founded on the concept and
knowledge constructs", in section 3, I have defined important constraint on the
subject, with the following title:
Limitation of Interpretation. Our assumption related to real world objects is that
we can recognize or match those objects for which we have perceptual, inferential or
rational abilities.
Therefore, I have defined that attributes are identifiers. We can identify attributes
by using our capacity in terms of the above-introduced "Limitation of
Interpretation".
Attributes are determined by the formula (3.3.3), see my paper from 2008. So, these
attributes are determined with the subject's ability to identify these attributes.
Formula (3.3.3) provides a link between the conceptual thinking and identification.
In my opinion it is not enough just to work with the concepts. With data-structures,
in addition to formula (3.3.3), general knowledge is also associated.
Associated knowledge about an attribute, can be as much as it Project leader decides.
Knowledge in my data model is determined by factual sentences. Facts and factual
sentences I defined in section 2 and 3 in my paper „Semantic databases and semantic
machines“.
4.
All other structures that are not attributes, which are complex, we construct in the
following way:
(i) Each complex structure is constructed from previous simpler structures.
(ii) The identifier of the complex structures we build using the identifiers of the
previous structures.
For example:
The entities are built from attributes, by using Leibniz's Law (or General Law). The
construction of the identifier of entity we build using attributes of this entity.
Note that attributes are atomic identifiers in my data model.
Identifiers of relationships are built from identifiers of entities participating in
relationships. Identifiers of states are constructed by using identifiers of entities
and by using general knowledge that is related to the corresponding entity.
Thus, for the attributes in my db design, there are two important structures
a) associated general knowledge that is related to the corresponding attribute
b) formula that is designated with (3.3.3)
In addition to the attributes, I also apply the formula (3.3.3) to the m-entities, m-
relationships, and m-states. Procedures for identification of relationships and
states are similar. So, constructions of complex objects are derived (recursively)
from simpler objects. For example identifiers of states are determined by identifiers
of the corresponding entities and by general knowledge related to the corresponding
entity.
As stated above, the identifiers for attributes are not derived. Attributes are
identifiers. They are given (they depend on subject's abilities).
I have already explained that the identifiers of entities are related to the
subject's operations with a memory; how to store an identifier of m-entity into a
memory and how to recall it from the memory. When we talk about thoughts, I explained
that surrogates are related to a subject and one memory (the memory where surrogates
and the corresponding m-attributes are stored).
I also explained that the industry-standard identifiers can be used to explain how
thoughts and semantic content are conveyed between two (or more) subjects that is to
say between two (or more) memories.
4. Step
1.
Now I will briefly describe the history of my work. When I talk about "General
databases", then I will mention that I solve "decomposition" on the atomic structures
and introduced the theory of states of entities and relationships.
In April 2006 I introduced Simple Form, for Simple databases, ie databases that
maintain current state. General and Simple Form enable the decomposition of
database's structures into atomic structures.
My paper "Database design and data model founded on the concept and knowledge
constructs" I submitted on 21 August 2008. I submitted my paper in „Journal of
Computing and Information Technology“ (from Croatia). Croatia is a country of my
origin.
Much time has passed since I submitted my paper without any information about whether
the paper had been accepted or rejected. I realized that the paper could not be
published even after a year, I contacted the Editor-in-Chief, Sven Lonacaric and
informed him I would publish the paper on my website, and in the case that it was
accepted, I would give all the rights to his journal. I quickly received a message
from D. Mladenic (from Slovenia ), the Associate Editor, that my paper was rejected.
I posted my paper on my website and on user group comp.dabases.theory on 7 March
2009.
I was aware that my work is good, I carefully examined D. Mladenic reviews, in a few
days. I stayed really astonished with her reviews. I have found that she does not
know elementary things in databases. Then I found that S. Loncaric did not know
databases, his field is "imiging". D. Mladinić also is not a professional for
databases, her specialty is "machine learning". I was put in a position to correct my
work; It was requested of me. I refused to do it, because I am sure that my paper is
correct. My paper, I presented on my website, exactly as it was submitted to the
Croatian journal. I also put reviews of D. Mladinić on Web, because this work is
important, and I spent years on this work. This review can be found in my thread "The
original version," in my post from 30th January, 2011.
Then I accidentally discovered that Anchor Modeling plagiarized my work. I informed
about it S. Loncaric, he did not respond. After a long correspondence between me, S.
Loncaric and D. Mladenic, it took one year, I realized that they were actually
banned the printing of my paper.
By the way in the Croatian magazine "Journal of Computing and Information Technology"
are not a world famous influential names. From well-known names, there is only Yuri
Gurevich. (as far as I know this area). Gurevich is famous mathematician and has
published papers with world famous mathematicians. Here's a web address of the
Croatian journal's
http://cit.srce.unizg.hr/index.php/CIT/about/editorialTeam
When I saw that Microsoft started making software for what I published in 2005, when
I saw the "Anchor modeling" plagiarism of my work, when I saw that in the 2010 Tine
Borovnik from the University of Ljubljana in his master's thesis analyzes "Anchor
Modeling" (By the way Dunja Mladenic is from Ljubljana), then the question was, what
should I do now?
On May 26, 2010 I decided to start thread "The original version", which will present
the truth about "Anchor Modeling". Now I see that it was the only chance to save my
work to some extent. Otherwise, there would be only about "Anchor modeling", my work
would be dead.
In the fall of 2010, in just a few months the authors of "Anchor modeling" released a
number of new papers. Their main work was published in the journal Data & Knowledge
Engineering, Editor Peter Chen. In this paper, the authors of "Anchor modeling"
corrected the mistakes that I presented in the thread "The original version". This
correction was done so as this time they have plagiarized my theory about states.
Some of papers, they have published on September 15, 2010 on their website. All these
papers are connected and the main thing in this work is mapping between data models.
2.
The most important part of my design is "decomposition on atomic structures" and the
theory of the states of entities and relationships. So in my data model design is
about states of entities and relationships, not about the entities and
relationships. In fact these states are decomposed into atomic structures.
3.
In the second part of this paper I introduced the databases that store programs.
Specifically these databases keep states of programs. One important idea here is that
the processes and events can be realized with the execution of a program, or one
state of the program. Of course, we can start a collection programs and then
formally speaking we have a history of future that is implemented from the database
that keeps the states of the programs and "knows" to implement a set of future
events. In my post of April 18, 2015 I called this database "small world" that runs
the following two important things:
(i) the world can maintain its past, present and future.
(ii) the main control part of the world is a collection of information, i.e. data
from appropriate database.
---------------------------------------------------------------------------------------------------
These databases that keep programs, I'll start a new thread short, with maybe three
posts. For this current thread I have one more post.
I think it would be useful for this user group that someone start a thread on
intellectual properties, plagiarism, etc.
------------------------------------------------------------------------------------------------------
Vladimir Odrljin