Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

cdt glossary 0.1.2

0 views
Skip to first unread message

mAsterdam

unread,
Jan 1, 2008, 11:38:27 AM1/1/08
to
---------------
Glossary 0.1.2 "You keep using that word.
I do not think it means
january 2008 what you think it means"
--------------- -- Inigo Montoya

Maintainer: mAsterdam

Preamble:
---------------
This glossary seeks to limit lengthy misunderstandings
in comp.database.theory. This newsgroup uses terms from
database modeling, design, implementation, operations,
change management, cost sharing, productivity research,
and /or basic database research.

People tend to assume that words mean what they are
accustomed to, and take for granted that the other
posters have about the same connotations.
They don't always.

It consists of signposts: Watch out! You may think the OP
means A but she might mean B. Alternative names and views
of the same concept are introduced when the danger
of mutual misunderstandings is appearant.
When context matters, it is provided. The glossary
is a highly biased list of problematic concepts.

Some words are particularly suspect:
data (!), database, object, normalisation.
Some just cause minor annoyances, the misunderstanding
is cleared and the discussion goes on:
domain, type, transaction.

We don't know well-accepted, formal or comprehensive
definitions for everything. If you do have a useful
reference, please provide it.
If an informal description is all we have, so be it.

What the glossary is not:
---------------
The glossary is not a dictionary or encyclopedia, such as
FOLDOC, Wikipedia, the Web Dictionary of Cybernetics and Systems
or a standardized vocabulary, ISO/IEC 2382.
Except the last one, these are freely available and easily found.

Specific links to serve the glossary's purpose are welcome,
of course. Also, it does not try to be a FAQ for "all
things database".

Credits:
---------------
The glossary is built from contributions.
Contributions from within this group are
not credited, quotes are.
If you want your name stated please say so.

If you want to contribute, please read the
notes at the end first.
==============

[Address]
A value, used to identify a location.
What is to be found there is up to the rest of the system.

An address is a value used to locate ...
A reference is a value used to refer ...

The difference between *locate* and *refer* is crucial here.

[Change management]
Many organizations have a CM process in place in
order to make their evolution more manageable.
The organization of data within a database can
and will change with these changing circumstances.
A DBMS should provide facilities to support this.
Changing the underlying structure should be
possible without affecting what is already stored.
For example, you can add a column to a table without
losing what is already there.

Related adjectives: maintainable, agile, flexible, adaptive.

[Class]
A class is what provides a name and a place for
the abstract behavior of a set of objects
said to belong to the class. (Larry Wall, Apocalypse 12)

note:
Other definitions welcome, this goes for the rest as well,
of course.

Some use 'class' as having exposed data.
Please be explicit about this if you do so.

[Data]
"Known facts that can be recorded and have implicit meaning."
-- Fundamentals of Database Systems, Elmasri & Navathe.

When people discuss data in the context of database,
they are usually talking of something with meaning.
There are people who think that data doesn't need
to mean anything.

In "Is Semantic Information Meaningful Data?"
( http://www.philosophyofinformation.net/pdf/iimd.pdf )
Luciano Floridi discusses several meanings of 'information'
and 'data'.

Somehow the "data has no meaning" idea has caught on.

1a. facts
1b. a record on a medium of some fact in the real world.
2. encoded information
3. a combination of sign and meaning

Warning: tongue in cheek definition
Information is what you want. Data is what you are given.

[Database]
"A logically coherent collection of related real-world data
assembled for a specific purpose." -- rephrased from
"Fundamentals of Database Systems", Elmasri & Navathe.

1. Deluxe filesystem
2. Shared databank (E. Codd)

[Data model]
"an abstract, self-contained, logical definition
of the objects, operators, and so forth, that
together constitute the abstract machine with
which users interact. The objects allow us to
model the structure of data. The operators allow
us to model its behavior."
(C. J. Date, An Introduction to Database Systems,
8e, 2003, p 15-16)

Data models are artificial constructs and may not
completely represent the true nature of information
and categorization. These categories already
exist, to some degree, in the way information
is handled outside the database.

Databases don't exist in vacuo; they're fed
(and consulted) by users who would have some system
of mental categorization even if they were shuffling
everything around with paper and pencil.

[Dimension]
1) A synonym for degree.
A relation R is of degree n if each tuple in R is an n-tuple.

2) An n-dimensional data structure, S, is one where each
element of S can be uniquely addressed as S[i1][i2]...[in]

Note: Because a table in a SQL-DBMS can be seen as a
conventional visualization of a mathematical relation
where the dimension is as in 1) above, and can also be
manipulated using a general purpose programming language
with the dimension using 2) above being equal to 2, there
can be confusion when using this term.

In this forum, use definition 1) freely and try to either
avoid 2) or be very clear, such as "2D array," when employing
definition 2).

[Domain]
1. Given a relation R, a domain is a set Sn such that
for each tuple (A1, A2, ...An, ...Am) in R,
An is an element of Sn.

2. A domain is a set of values: for example
"integers between 0 and 255",
"character strings less than 10 characters long",
"dates".
Sometimes used synonymously with type.

[Entity]
Thing of interest. (ISO)

"An entity is a 'thing' which can be distinctly identified. A specific
person, company, or event is an example of an entity. "
("The Entity-Relationship Model-Toward a Unified View of Data", 1976, P.
Chen., http://www2.cis.gsu.edu/dmcdonald/cis8140/Chen.pdf )

Edward Yourdon, who describes E/R in his work Modern Structured
Analysis, (Prentice Hall 1989) defines the concept of Entity
as having three properties:

1. Each representation of an entity can uniquely be identified
2. Each representation of an entity is playing an important role in
the system it lives in. (it has to have a reason to be there)
3. Each representation of an entity can be described by one or more
attributes (data-elements, like name, age, quantity)

This term is often used when doing conceptual data modeling.
When it is used with a particular product, technique, or technology,
such as XML, refer to the use of the term within that "namespace" using
an adjective, such as "XML entity" to distinguish it from the more
generic use of the term.

For subtleties (e.g. strong and weak entity) -
please search the web.

[Fact]
1. A piece of information about circumstances that exist or
events that have occurred
2. A concept whose truth can be proved.
3. A statement or assertion of verified information.
4. An event known to have happened or something known to have existed.

[Flat]
1) An object which by any definition could be considered as 2
dimensional might informally be called flat.

2) (controversial:)
The absence of hierarchy (multiple levels of details).

Note: Any use of the term flat tends to be seen as inflammatory by
someone, so take care to use it only when intending to inflame ;-)

[Function]
For now we have to live with different meanings
of _function_ when talking about databases:
"The function of this function is to get the tuples from B
that are functionally dependant on A."

Three different contexts, but just about the same meaning:

General
A purpose or use.
Math
A binary mathematical relation with at most
one b for each a in (a,b).
Software
A subroutine, procedure, or method.

notes:
every operator is a function
every function is a relation

Please be specific.

[Information]
0. data in context, data with meaning.
(This implies a definition of data as being without context,
without meaning - see data)
1. new data to the receptor.
2. available data, relevant to some decision or action.
Also see [Data].

[Information principle] (RM)
Date/Codd:
Chris Date in "EDGAR F. CODD 08/23/1923 – 04/18/2003 A TRIBUTE":
The entire information content of a relational database
is represented in one and only one way: namely, as
attribute values within tuples within relations.

[Key]
A value, used to identify something.
See also primary key, and (TODO:) foreign key.

[Meaning]
(meaning vs use)
Say we currently have a validated statement
about the exchange rate of some stock at some
recent time.

1. It does not matter to the meaning
where/how this statement is represented. We have it.
2. To the use of it it is important where/how
it is represented, and available to relevant actors.
3. Twenty years later the meaning of this statement
is still the same.
4. Twenty years later most of its usefulness will
probably have gone.

It may be --- in some instances -- not appropriate to make this
distinction. The meaning of data is always contextual.
The same bit of data means different things to different
structured viewpoints within the organization, for example,
and at different times (epochs). One grain of sand does not
form a beach. One bit of data itself has little meaning.
It is rather the collective of all data that possesses
greater notion of meaning.

[MultiValue, MV]
1. One name for the industry surrounding the Nelson-Pick data model.
In this context:
FILE: a real-world collective noun.
RECORD: a real-world object.
FIELD: is a real-world adjective.n.

2. A data field (or attribute) defined to permit a variable number of
values as a list (array).

[NULL]
Roughly: a special marker that can be put in a place
inside a data structure where an actual value is expected.
Precisely what that marker means varies and there are at
least three possibilities that are sometimes assumed:

(1) "Unknown value" This means that on the place of the marker
there should actually be a value but this value is not known
at the present time. For example, if a 'name' field in a tuple
describing a person is 'null' then this person will have a
name but we don't know it.

(2) "Absent value" This means that the property that is
described by the value in question is simply not defined.
For example, if the 'shipping-date' field in a tuple
describing an order is 'null' then the order was
not shipped yet.

(3) "Whatever SQL says it means" The exact meaning is hard to
summarize briefly, but is a mixture of the previous two
interpretations and involves a value with three truth-values
('true', 'false' and 'unknown').


Common usage:

- Confusion arises when people use terms like "null value",
a paradox to some, a contradictio in terminis to others.

- Confusion arises due to the fact that nullness (the absence
of value) is often represented on computers by the number 0.
(Obviously, 0 is not null.)

- In some contexts, 'null' and 'nil' mean the same thing;
in others, they do not.

In databases traditionally NULL is used and and opposed.
If you want to go into this, please first search for
mu NIL void NULL undef, 2VL 3VL.

"It isn't the things we don't know that give us trouble.
It's the things we know that ain't so." - Will Rogers

Note: Several better proposals have been made for this
entry. Unfortunately they all led to huge threads where
the maintainer couldn't decide which texts to quote here.

[Object]
1. Model of an entity, characterised by behaviour and state. (ISO)
2. Something intelligible or perceptible by the mind.

[Table/Row/Column] (SQL-DBMS)
Table: A collection of columns (the table header) and rows (the body).
Row: A collection of values, conforming to the table header columns.

One table may contain data about one entity,
about several entities, about one or several
relationships or any combination.
A column can be seen as the attribute of the
entity/one of the entities/relationships
about which the table is concerned.

[Primary key] (SQL, not RM)
A key of a table, composed of one or more
named columns, uniquely identifies a row in a table.
A table can have only one primary key.

[Theory]
"Database management is a practical matter.
Any so-called theory of database management that doesn't
facilitate the practice would be nothing but
self-indulgent conjecture. Theorists (should)
want to know about what goes on in practice just as much as
practitioners should want to know what theory has to say."
-- Roy Hann, cdt dec 11, 2007

[Type]
" TYPES are sets of things we can talk about;
RELATIONS are (true) statements bout those things."
-- Chris Date, Feb 2004

1. Set of possible values (i.e. IT equivalent of math 'domain').
2. Set of possible values plus
all possible operators defined on them. (i.e. synonymous to Class
if 'class' is meant to include a possible set of values).

This is highly misunderstanding-prone area, so please
take some care to be specific.

[Type - 3rdM]
In The Third Manifesto a type is:
- a pattern (possible representation)
- a domain for some operators (THE_xxx operators)
- a codomain for some operators (the "constructors")

There is a requirement for the 'domain' and the 'codomain'
to be the same set.

[Pointer]
See address(*).

[Reference]
A reference is a value, used to refer to something.
A program can get the current value of that something
(without ever knowing where it resides) by dereferencing,
even if that something has been relocated between
the time of first reference and the dereferencing.

[References, pointers, keys]
While references may be implemented as pointers,
the programmer prefers not to know (if he prefers
to know he should have used pointers).

In some programming languages one can declare
variables of a pointer type - these variables
can have pointer values.
m.m. (mutatis mutandis) reference.

Two operations are supported:
referencing and dereferencing.
On references only these operations are possible.
On pointers other operations are possible.

The dereferencing operation takes a pointer
*value* and returns a pointer *variable* of
the type the pointer refers to.
The referencing operation is the inverse operation.
It takes a *variable* and returns a pointer *value*.
m.m. reference.

In Java the term pointer was avoided
because pointer is often used to mean
physical memory addresses.

Foreign keys are not links.
Links point, foreign keys constrain.

[Relation]
1. A relation is a subset of the set of ordered
tuples (A1, A2, ... Am) formed by the Cartesian
cross-product of sets S1 x ... x Sm where each
An is an element of Sn.

Note: A set, Sx, is not restricted from participating
as a member of a relation more than once.
Distinction between identical sets in math is possible
through ordinal numbering such that given sets Sx and Sy,
x <> y AND Sx is a subset of Sy and Sy is a subset of Sx;
in relational theory, in contrast, it is by attribute name.

2. ...

[Transaction]
A set of database operations constituting a logical unit of work.
Most DBMS include the ability to rollback complete transactions
when an error is detected.

=============

[[Issues]]

RELATIONs vs. RELATIONSHIPs
Can namespaces help to make some distance? In this case:
RM.RELATION vs. ER.RELATIONSHIP

represented vs. described

RELATION(SHIP)s vs RELATION(SHIP)s SET

fact vs. thing (ENTITY).

First Order Logic vs. Higher Order Logic.

What, if there is, is the equivalent of an ENTITY(SET) in the RM ?


Does it make sense to talk about attributes of a fact ?
How are those different from ATTRIBUTES of an ENTITY ?
Traditionally there can be Multivalued ATTRIBUTES
in ER, RM has atomic ATTRIBUTES.
So: RM.ATTRIBUTE and ER.ATTRIBUTE ?

In ER modeling, a RELATIONSHIP is defined over ENTITIES:
"A relationship is an association between several entities."
In RM, a RELATION is defined over VALUEs.
What is the difference between ENTITIES and VALUEs ?


=============

[[ToDo]]:

(please feel invited to write entries for these)

Application
Architecture
Attribute
Concept
Dynamic vs. static
Hierarchy
Identity
Normalize
Location
Persistence
Operator
ORM
Orthogonal
Relation vs. Relationship
Scalar
Schema

Feel free to post suggestions to add or remove.

How to contribute
-----------------

Content:
Please keep in mind that the focus of the glossary
is on /real/ c.d.t. misunderstandings.

Some discussions, after many sidetracks, are reducible
to /just/ different meanings and connotations of a word.
The differences could be resolved with just:
"Ah, now I see what you meant by that; next time I'll
be a little more careful in my choice of words".
Such words are nice glossary candidates.

Examples from the past: Address, Domain.

Sometimes, though, It's not just different connotation
or meaning which leads to the long winding talks
without communication. These differences go down to
deeply held strong opinions.
Some differences in the use of words run much deeper than
we can hope to clear up with just some definitions and
warning signposts. They might help a little anyway, so
these nastier entries are welcome, to.

Examples from the past: NULL, Flat.


Form:
Please post your proposal as copy & pastable text,
with a subject line like this:

subject: cdt glossary [Identity]

Please also check spelling and grammar mistaeks.

Thank you for contributing.

----
Milestones? For the glossary I prefer inch-pebbles.

0 new messages