Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Standard SQL vs PusGrossNONsql

95 views

Skip to first unread message

Derek Ignatius Asirvadem

unread,

Jan 6, 2020, 4:30:18 AM1/6/20

> On Monday, 6 January 2020 04:33:04 UTC+11, Lifepillar wrote:
>
> You have asked to keep the content of these posts to the subject, but
> this "qualifying test" is a digression. Please move it to a different
> thread if you want to continue discussing it.

Yes. Some small but important items that I need to close. I've used a title such that anything that falls in this category can be collected here.

> Not sure you've appreciated the irony in my responses.

Probably quite true. I have been told, I am hopeless at sarcasm.

> But you make good
> points, with which I agree entirely.

Thank you.

> I'd add that too often too much
> emphasis is put on making the life of developers easier, regardless
> of the consequences for the users, and this trend, unfortunately,
> encompasses the whole computer industry.

In general, agreed. And it is getting worse every year, with the leftist, unicorn multiverse types, who think anyone can "code", no education required.

At the high-end of the market, that is not true, the ownership of data lies with the user, not the developer. Standards are more known and understood, demanded of developers. It is more that one gets what one pays for. You pay peanuts, you get monkeys. High-end customers tend to pay more, and demand more.

> >> > So, a qualifying question ...
> >> > Do you understand that Object_V.Description is the concatenated
> >> > list of Keywords, that describes a PartCode ? That is derived by
> >> > calling a recursive Function, eg. Part_Description_fn( PartCode ).
>
> To get the keywords directly associated with a given PartCode X, I would
> not use recursion:
>
> select PartCode, string_agg(Keyword, ', ' order by Keyword)
> from PartDescription
> where PartCode = X
> group by PartCode;

Two responses and one question.

1. I am sure that that works, because I trust you.

1.q. Does that work without the WHERE clause limiting it to a single PartCode ? Ie. Can that code be placed in the definition for the View Object_V ?

2. Separate point. That is anti-SQL. The concept is to get the list of Keywords (Directories in the other example) into a single column, for use where any column can be used. SQL Function with recursion is the standard way to do that. Therefore, since I am now informed the PooGross has recursion, it should be done that way.

Second, the reason it is anti-SQL is because it contradicts the way it is done in standard SQL, and because it hacks SQL Syntax. String_Agg() is an established Oracle function. It was required because Oracle cacks itself when recursion is attempted. Same way it hangs up for hours when a subquery is attempted. So they (and now PooGroos) have to do things in weird, non-logical and indirect ways. Enter a function by the same name as Oracle. No surprise, guess where the PusGrew developers come from.

You might call that /making it easy for the developer/, which means guaranteeing a re-write when the app justifies migration to an SQL platform, and guarantees a headache for any maintenance developer up to that point.

> > This will probably be the subject of questions, so let me take it from
> > the top, if you do not mind. This is the second of the three types of
> > Hierarchies in the RM. The example is the Unix Node (file & directory
> > hierarchy), the data is:
> > http://www.softwaregems.com.au/Documents/Tutorial/Recursion/Hierarchy%20Inline.pdf
> >
> > The data model is:
> > http://www.softwaregems.com.au/Documents/Tutorial/Recursion/Directory%20DM%20Inline.pdf
> >
> > We need to produce the Path, as a column. Perfect candidate for
> > a Function, that is recursive.
> > Node_Path_fn (
> > NodeNo, -- Starting point
> > ReturnNodeNo -- Boolean: if set, return CSV NodeNos,
> > else return "/" separated list of Node.Names
> > )
> > Returns Path ( CHAR(255) not TEXT) -- list of NodeNos/Node.Names
>
> Yes, for that I'd use a recursive function,

So why a recursive Function for this but a String_Agg() for Object_V. Description (further above) ?

> which in PostgreSQL I'd code
> as follows, modulo some omitted details:
>
> create function Node_Path_fn(_NodeNo integer)
> returns char(255)
> language sql as
> $$
> select case
> when ParentNodeNo = _NodeNo then '/'
> else Node_Path_fn(ParentNodeNo) || Name || '/'
> end
> from Node
> where NodeNo = _NodeNo;
> $$;

Noting the modulo, looks good to me, logical SQL.

> > The same Function is used in the CONSTRAINT to CHECK that the NodeNo
> > attempted on INSERT is not in the list of ancestors (circular
> > reference).
>
> Mmh, no. A newly inserted node cannot be in the list of its ancestors.
> Such constraint would be useful to avoid circularity on UPDATEs.

That raises two issues from my perspective, treating it as such.

1. First, can you implement a CHECK CONSTRAINT that calls a Function (non-recursive or recursive or StringAgg() ) or not ?

In SQL we can, since 2007, when the facility was declared as part of the SQL Standard. This obtains to Data Integrity that is beyond Domain and Key. Whereas CHECK in the ordinary form checks the attempted new row (INSERT or UPDATE), CHECK ... Function checks:
- against some other row or rows in the same table
- against some row or rows in another table

For me, recursive or not, since I need the same code segment for both (a) constructing Path, and (b) preventing a circular reference, I have one Function, not two.

2.a. We don't know that an attempted INSERT or UPDATE /inserts/ a circular reference in the tree, unless we check for it and prevent it (the function).

2.b. In high-end systems, due to OLTP Standards (you may wish to discuss that in the new ACID Transactions thread) we do not allow UPDATE of a Key. The platforms do not have CASCADE, the notion is mickey mouse. The "pure theory" notion that you guys need not concern yourselves with implementation considerations proves, in and of itself, that that "pure theory" has zero value, it must not be used in the real world. A "transaction" that updates a million rows, which means two million plus plus BeforeImages and AfterImages (that is 2PL, it is much worse under MVCC), will strangle any system.

One must (a) DELETE the old tree (b) INSERT the new tree, (c) in physically possible batches, in order to both (d) reduce contention while in progress, and (e) prevent filling up the transaction log. Hence:

> > The same Function is used in the CONSTRAINT to CHECK that the NodeNo
> > attempted on INSERT is not in the list of ancestors (circular
> > reference).

----

> If you want to prevent multiple roots (i.e., more than one record with
> NodeNo = ParentNodeNo), I would define a separate constraint.

I would not call that a multiple root, that is the smallest, tightest circular reference ! Even one row (not record, notice the slip) is not allowed to do that. Yes, that is a separate simple CHECK CONSTRAINT, placed before the CONSTRAINT that checks for circular references outside the row, in the ancestor list.

> > If PusGrossNONsql has no recursion,
>
> It does. Both recursive functions

Great.

> and recursive queries.

Anti-SQL, as detailed above.

What about recursive stored procs, which came first historically (1985 for RDBMS, 1976 for pre-Relation DBMS) ?

> > In SQL, the latter (either the Assembly tree, or the Component tree)
> > is serviced by a Stored Proc, not a Function. Last time I looked at
> > the bastard grandson of Stonebraker, it did not have Stored Procs.
>
> They were introduced very recently:
>
> https://www.postgresql.org/docs/12/sql-createprocedure.html

Thanks.

> > In normal SQL, the Assembly or Component tree would be produced by
> > a recursive Stored Proc that contains a single SELECT. With full
> > indentation to show the levels in the tree, or a LevelNo which is
> > derived, etc. Executes itself until it reaches the leaf level across
> > all branches.
>
> You'd achieve that in PostgreSQL with a function returning a table.
>
> > Second option: just code a single pure SELECT, in a WHILE (forever)
> > loop that traverses the tree.
>
> I prefer to avoid this kind of coding, except when it is needed because
> it provably improves performance.

(It was a Second Option, not required because you do have recursion.)

Responding to your point. In high-end systems, we try to code direct, logical SQL, not convoluted. We do not worry about performance because the Query Optimiser has 40 years maturity, it produces brilliant a Parse Tree; Query Tree; Query Plan. We only look at something if it does not perform well, if it begs our attention.

In mickey mouse suites, the situation is reversed. They worry about performance because they have to. The Query Optimiser is none of that, if it exists at all. Performance is not predictable. Often a developer will try two or three methods just to evaluate the performance difference.

Cheers
Derek

Nicola

unread,

Jan 6, 2020, 4:36:05 PM1/6/20

On 2020-01-06, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> To get the keywords directly associated with a given PartCode X, I would
>> not use recursion:
>>
>> select PartCode, string_agg(Keyword, ', ' order by Keyword)
>> from PartDescription
>> where PartCode = X
>> group by PartCode;
>
> Two responses and one question.
>
> 1. I am sure that that works, because I trust you.
>
> 1.q. Does that work without the WHERE clause limiting it to a single
> PartCode ? Ie. Can that code be placed in the definition for the
> View Object_V ?

Yes. For instance:

select O.ObjectCode,
O.ObjectTypeCode,
O.Name,
string_agg(D.Keyword, ', ' order by D.Keyword) as Description
from Object O
join Part P
on O.ObjectCode = P.PartCode
join PartDescription D
on D.PartCode = P.PartCode
group by O.ObjectCode, O.ObjectTypeCode, O.Name;

Or:

with Description(PartCode, Description) as (

select PartCode, string_agg(Keyword, ', ' order by Keyword)
from PartDescription

group by PartCode
)
select O.ObjectCode,
O.ObjectTypeCode,
O.Name,
D.Description
from Object O
join Part P
on O.ObjectCode = P.PartCode
join Description D
on D.PartCode = P.PartCode;

> 2. Separate point. That is anti-SQL.

I don't see how: string_agg() is just an aggregate function operating on
strings. Unless anything except count(), sum(), avg(), min() and max()
for you in "anti-SQL".

Not SQL-standard compliant, I agree. AFAIK, there's no string_agg() in
the standard.

> The concept is to get the list of Keywords (Directories in the other
> example) into a single column, for use where any column can be used.
> SQL Function with recursion is the standard way to do that.
> Therefore, since I am now informed the PooGross has recursion, it
> should be done that way.

Do you mean something along these lines (PostgreSQL syntax, does not
deal with some edge cases)?

-- Returns the list of keywords of _PartCode
-- lexicographically greater than _Keyword
create function Next_Keywords(_PartCode integer, _Keyword char(30))

returns char(255)
language sql as
$$
select case

when min(Keyword) = max(Keyword)
then min(Keyword)
else min(Keyword) || ', ' || Next_Keywords(_PartCode, min(Keyword))
end
from PartDescription
where PartCode = _PartCode
and Keyword > _Keyword;
$$;

create function Description_fn(_PartCode integer)

returns char(255)
language sql as
$$

select Next_Keywords(_PartCode, '')
from PartDescription
where PartCode = _PartCode;
$$;

> You might call that /making it easy for the developer/, which means
> guaranteeing a re-write when the app justifies migration to an SQL
> platform, and guarantees a headache for any maintenance developer up
> to that point.

Sure, portability-wise it's always better to stick to standards.
Unfortunately, migration from one DBMS to another is unlikely to be
hassle-free even if one sticks to strict ISO SQL.

>> > This will probably be the subject of questions, so let me take it from
>> > the top, if you do not mind. This is the second of the three types of
>> > Hierarchies in the RM. The example is the Unix Node (file & directory
>> > hierarchy), the data is:
>> > http://www.softwaregems.com.au/Documents/Tutorial/Recursion/Hierarchy%20Inline.pdf
>> >
>> > The data model is:
>> > http://www.softwaregems.com.au/Documents/Tutorial/Recursion/Directory%20DM%20Inline.pdf
>> >
>> > We need to produce the Path, as a column. Perfect candidate for
>> > a Function, that is recursive.
>> > Node_Path_fn (
>> > NodeNo, -- Starting point
>> > ReturnNodeNo -- Boolean: if set, return CSV NodeNos,
>> > else return "/" separated list of Node.Names
>> > )
>> > Returns Path ( CHAR(255) not TEXT) -- list of NodeNos/Node.Names
>>
>> Yes, for that I'd use a recursive function,
>
> So why a recursive Function for this but a String_Agg() for Object_V. Description (further above) ?

Just convenience (some systems make that function available), as you
said. Another argument might be performance, but I have not tested that.

>> > The same Function is used in the CONSTRAINT to CHECK that the NodeNo
>> > attempted on INSERT is not in the list of ancestors (circular
>> > reference).
>>
>> Mmh, no. A newly inserted node cannot be in the list of its ancestors.
>> Such constraint would be useful to avoid circularity on UPDATEs.
>
> That raises two issues from my perspective, treating it as such.
>
> 1. First, can you implement a CHECK CONSTRAINT that calls a Function
> (non-recursive or recursive or StringAgg() ) or not ?

Well, yes.

> In SQL we can, since 2007, when the facility was declared as part of
> the SQL Standard. This obtains to Data Integrity that is beyond
> Domain and Key. Whereas CHECK in the ordinary form checks the
> attempted new row (INSERT or UPDATE), CHECK ... Function checks:
> - against some other row or rows in the same table
> - against some row or rows in another table

Sure, here in anti-SQL world we have discovered that, too. Making
progress.

> For me, recursive or not, since I need the same code segment for both
> (a) constructing Path, and (b) preventing a circular reference, I have
> one Function, not two.

Ok, agreed.

> 2.a. We don't know that an attempted INSERT or UPDATE /inserts/
> a circular reference in the tree, unless we check for it and prevent
> it (the function).
>
> 2.b. In high-end systems, due to OLTP Standards (you may wish to
> discuss that in the new ACID Transactions thread) we do not allow
> UPDATE of a Key. The platforms do not have CASCADE, the notion is
> mickey mouse. The "pure theory" notion that you guys need not concern
> yourselves with implementation considerations proves, in and of
> itself, that that "pure theory" has zero value, it must not be used
> in the real world. A "transaction" that updates a million rows, which
> means two million plus plus BeforeImages and AfterImages (that is 2PL,
> it is much worse under MVCC), will strangle any system.

> One must (a) DELETE the old tree (b) INSERT the new tree, (c) in
> physically possible batches, in order to both (d) reduce contention
> while in progress, and (e) prevent filling up the transaction log.

That would be a nice exercise to try. I'll keep it in mind for my next
course.

> Hence:
>
>> > The same Function is used in the CONSTRAINT to CHECK that the NodeNo
>> > attempted on INSERT is not in the list of ancestors (circular
>> > reference).
>
> ----
>
>> If you want to prevent multiple roots (i.e., more than one record with
>> NodeNo = ParentNodeNo), I would define a separate constraint.
>
> I would not call that a multiple root, that is the smallest, tightest
> circular reference ! Even one row (not record, notice the slip) is
> not allowed to do that. Yes, that is a separate simple CHECK
> CONSTRAINT, placed before the CONSTRAINT that checks for circular
> references outside the row, in the ancestor list.

Ok. How do you express the condition used to check for self-loops (e.g.,
rows with value (n,n))? It can't be simply CHECK (NodeNo <>
ParentNodeNo), because you must allow at least one (or exactly one) such
row. I am curious to know what, in your experience, is an efficient way
to perform that check.

>> > If PusGrossNONsql has no recursion,
>>
>> It does. Both recursive functions
>
> Great.
>
>> and recursive queries.
>
> Anti-SQL, as detailed above.

Ah, why did I mention Datalog in the other thread :D

> What about recursive stored procs, which came first historically (1985
> for RDBMS, 1976 for pre-Relation DBMS) ?

Why not? SQL, like any language, allows you to express the same thing in
different ways. IMO, choose what looks simpler (and probably most
elegant) in the first place; in many cases, that is also the most
efficient solution. Reformulate in a different way if you find that you
need to improve X and the reformulation improves X—where X may be
portability, speed, concurrency, etc.

> Responding to your point. In high-end systems, we try to code direct,
> logical SQL, not convoluted.

Exactly.

> We do not worry about performance
> because the Query Optimiser has 40 years maturity, it produces
> brilliant a Parse Tree; Query Tree; Query Plan. We only look at
> something if it does not perform well, if it begs our attention.

Exactly.

> In mickey mouse suites, the situation is reversed. They worry about
> performance because they have to.

That sounds more like... than PostgreSQL. I don't want to start flame
wars :)

> The Query Optimiser is none of
> that, if it exists at all. Performance is not predictable. Often
> a developer will try two or three methods just to evaluate the
> performance difference.

I would love to see some comparative evaluation.

Btw, Sybase was acquired by SAP. Does it still exist as a product?

Nicola

Derek Ignatius Asirvadem

unread,

Jan 7, 2020, 4:33:55 AM1/7/20

Ok. The one thing left, that I am hung up about, in both those code segments, is this. Why the GROUP BY, is it required to get the string_agg() to work ? The string_agg() being one anti-SQL trick, and the technique required to get it to work being part of the trick.

And separately, do you agree, all things considered (as per discussion, eg. coding while observing the SQL Standard; portability; directness), coding the above using a recursive Function would be simpler, more straight-forward, etc.

> > 2. Separate point. That is anti-SQL.
>
> I don't see how: string_agg() is just an aggregate function operating on
> strings. Unless anything except count(), sum(), avg(), min() and max()
> for you in "anti-SQL".
>
> Not SQL-standard compliant, I agree. AFAIK, there's no string_agg() in
> the standard.

Your second comment destroys your first comment.

If there is an SQL way to preform some function, and your mickey mouse suite performs said function a different way, it is anti-SQL. If that anti-SQL way blocks portability to an SQL platform, then it is purposely making sure that you invest in the suite *and* that you can't move out of it. Like catching syphilis at the brothel: the girls say, la la la la la la, now you can't go anywhere else. PusGross has many such features, eg. ENUM; the idiotic and anti-Relational Deferred Constraint Checking. It is entrapment on a grand scale.

I will respond to this next item as a generic technique for anti-logical argumentation, because if you were conscious of what you have done, I trust you would not do it.

> Unless anything except count(), sum(), avg(), min() and max()
> for you in "anti-SQL".

The item at issue is whether the string_agg() function is anti-SQL (my charge), or not. The scientific, logical argument is to argue that point squarely. If not, then whatever argument is thrown up, is not square, not logical, perhaps manipulative, but in any case, does not lead to resolution of the issue. The actual result is, it re-frames the argument as something that it is not, and makes the person sound stupid.

The item at issue is not that since it is a function, therefore it lies in the category of all functions, and since it is not one of the core SQL functions, therefore I must be (through this confected chain of anti-logical propositions) opposing all functions that are not core SQL functions.

I did not say that. That is your interpretation. Not mine. I said that anything that provides a user-requirement in a way that is different to the way that SQL does provide, is anti-SQL.

> string_agg() is just an aggregate function operating on
> strings.

It is not an aggregate. It manufactures a list of items, with a given separator. It is the normal result set from a SELECT, which is multiple-row single-column, converted into CSV and delivered as single-row single-column.

An example of an aggregate string function is HashBytes( <long_string> ), where <long_string> is usually the whole row, all columns converted to CHAR(), for the purpose of determining, when compared to as previously stored image, if it has changed.

Further, it is anti-logical, because the logical way (FOPC -> RA -> SQL) is to append each increment to a string, which is a recursive function.

As a result of this exchange, I think the rule to give my followers who are stuck in PooGross land, all things considered, is:
- do not use recursive queries ( string_agg() )
- use recursive functions (don't be lazy)

Yes. Except that I would do that in one function, not two. The same way you did Node_Path_fn.

Btw, PartCode is CHAR, not INT. *Code is a naming convention meaning CHAR, short name. *No or Num* for INT.

> > You might call that /making it easy for the developer/, which means
> > guaranteeing a re-write when the app justifies migration to an SQL
> > platform, and guarantees a headache for any maintenance developer up
> > to that point.
>
> Sure, portability-wise it's always better to stick to standards.
> Unfortunately, migration from one DBMS to another is unlikely to be
> hassle-free even if one sticks to strict ISO SQL.

That is not logical. No one is saying that porting from one platform to another is hassle-free. Everyone understands that portability is a Good Thing to strive for, and Standards are predicated on that. If you use the 2-char ISO CountryCode, you will be far better off, for many reasons, than using a roll-your-own CountryCode. In any given platform, which necessarily has Extensions to SQL, if you stick to ISO SQL in preference over some Extension, you will be far better off than if you prefer the Extension over ISO SQL.

In the high end of the market we have a different problem. Eg. IBM and Sybase have had, from the earliest days of SQL, true SQL platforms. With the full gamut of capability required for a DBMS (note that is not RDBMS), because what a DBMS was, was already strongly established. But since many of those capabilities were best delivered /inside/ SQL, such that the developer was NOT forced to exit an SQL code segment to execute that function and then jump back in again, to continue in SQL, they were delivered as Extensions to SQL. Transact-SQL is the best example of that, it was so good that MS stole it, and Informix and even IBM implemented some of its functions.

Years later, sometimes decades later, the Standards caught up, then ANSI, now ISO, and declared a particular SQL syntax for a particular function that we had implemented earlier in a proprietary syntax. And were not about to change, or rewrite code. A simple example would be regex on strings, anywhere in the SQL code. We had that since 1985, the ISO SQL declared it 1999. Or row pattern matching in 2005, in ISO in 2017 under a different name. Or COALESCE() when they finally figured out how to deal with NULL, to provide what we had for 30 years as ISNULL(). Sybase just provided the new syntax to comply with the Standard.

In the low end of the market, Oracle; MyNONsql; PooGooNONsql, it is the other way around. The suites not only did not have SQL, they did not have core features of a DBMS. We have had ACID Transactions from 1985 (IBM had it earlier via System/R, proprietary SQL, etc), PostGross had nothing in its initial versions. Some 20 years after the initial version, some thirty five years after we had it, PostGross delivered the pathetic "functions are now transactional" over the fragile MVCC philosophy.

It may be another ten or twenty years before you guys get genuine OLTP, genuine ACID Transactions. Whenever that happens, you will have to rewrite great slabs of code, because the structure of the code will have to change. Even worse for people who were clueless re Transactions, who delivered CRUD or some other filth, who counted on CASCADE. Instead of doing it right from the outset.

But academics love their toys, they will not admit that they are 30 or 40 years behind the commercial market. They rely on the promise: "it is improving, one day it will ..."

The point is, we are worlds apart, SQL vs Anti-SQL. As evidenced in the examples in this and the other thread, which is closing the gap wrt to understanding only.

> > In SQL we can, since 2007, when the facility was declared as part of
> > the SQL Standard. This obtains to Data Integrity that is beyond
> > Domain and Key. Whereas CHECK in the ordinary form checks the
> > attempted new row (INSERT or UPDATE), CHECK ... Function checks:
> > - against some other row or rows in the same table
> > - against some row or rows in another table
>
> Sure, here in anti-SQL world we have discovered that, too. Making
> progress.

Maybe in 2030.

> >> If you want to prevent multiple roots (i.e., more than one record with
> >> NodeNo = ParentNodeNo), I would define a separate constraint.
> >
> > I would not call that a multiple root, that is the smallest, tightest
> > circular reference ! Even one row (not record, notice the slip) is
> > not allowed to do that. Yes, that is a separate simple CHECK
> > CONSTRAINT, placed before the CONSTRAINT that checks for circular
> > references outside the row, in the ancestor list.
>
> Ok. How do you express the condition used to check for self-loops (e.g.,
> rows with value (n,n))? It can't be simply CHECK (NodeNo <>
> ParentNodeNo), because you must allow at least one (or exactly one) such
> row.

Hang on. Before I answer the next question. Why precisely do we need such a row ?

> I am curious to know what, in your experience, is an efficient way
> to perform that check.

On hold.

> > Anti-SQL, as detailed above.
>
> Ah, why did I mention Datalog in the other thread :D

> > What about recursive stored procs, which came first historically (1985
> > for RDBMS, 1976 for pre-Relation DBMS) ?
>
> Why not? SQL, like any language, allows you to express the same thing in
> different ways. IMO, choose what looks simpler (and probably most
> elegant) in the first place; in many cases, that is also the most
> efficient solution. Reformulate in a different way if you find that you
> need to improve X and the reformulation improves X—where X may be
> portability, speed, concurrency, etc.

For all categories listed except concurrency, agreed. I would add Logical in front.

For concurrency (low contention, high concurrency), strongly rejected. The whole collection of related items is called OLTP Standards, ACID Transactions being just one item. It is a conceptual technology, that is in two parts. It is a total prevention of preventable problems, rather than curative:
1. the required structures must be implemented in the database (data model, at the stage where it progresses to Physical)
2.0. all Transactions have to be designed with OLTP considerations in mind, the smallest possible Logical Unit of Work, for the given data model.
2.1 all Transactions must be written using the OLTP Code Template. Means 4 mandatory code blocks and a demanded compilation which means a Stored Proc per Xact.
(This is a quick definition, if you have questions, please open a separate thread.)

Since all Transaction sp's have the required structure (Template), there is nothing to do, nothing to improve, in the area of concurrency. That the Template has been faithfully implemented; that it matches intent per the data model, is a matter of Code Reviews by Peers. Sure, at load testing only, and not before, contention problems will be exposed, but always (100% of my experience over 44 years fixing problems in this specific area) it is due to failure of [1][2.0][2.1], and always eliminated by proper implementation thereof.

> > In mickey mouse suites, the situation is reversed. They worry about
> > performance because they have to.
>
> That sounds more like... than PostgreSQL. I don't want to start flame
> wars :)

It is definitely the Oracle mindset. Often the MS/SQL mindset. Always the newbie (to anything) mindset.

> > The Query Optimiser is none of
> > that, if it exists at all. Performance is not predictable. Often
> > a developer will try two or three methods just to evaluate the
> > performance difference.
>
> I would love to see some comparative evaluation.

I can't give you exactly what is called for here, but we do that in advanced courses, to kill an argument along those lines. Easy to produce from Sybase. I do have many true benchmarks that address one or more issues in a particular database, or cover the issues at a higher level, such that over the decades I have accumulated proof, so that I can declare (eg):
//in Sybase, a Subquery is marginally (inconsequentially) faster than a Derived Table, therefore use whatever is more logical in the particular code segment.//
Or
//In Oracle, never use Subquery, always use a Derived Table instead.//

I can't give you those because they are customer confidential. What I can give you is a public benchmark I ran for Oracle vs Sybase, testing Subqueries and "Inline Views" (Oracle terminology for a Derived Table). The guy I ran it for was a famous Oracle consultant, but he turned out to be ignorant of basic SQL; benchmark methods; Query Plans; etc. I did have a full write-up on well-known forum, (including the back-and-forth which can be ignored), but he has had it removed, because Oracle showed up very badly. Therefore all that is left is the files on my server that were the reference material for the write-up.

The quick desc is, I stated somewhere that Oracle cannot handle subqueries; he attacked and said that was false; he said there was a "better way" to do subqueries (Inline Views) but that subqueries were no problem; I challenged him to a benchmark; being friendly at the time, I had to teach him how to load benchmark tables, etc; we ran the benchmarks on the comparable equipment we had; he was limited to a PC/Windows/OracleNONserver, so I ran mine on MacBookPro/WindowsEmulator/Windows/SybaseServer.

He was dishonest, after Oracle cacked itself, he did not finish the rest of the benchmark or submit his QPs, etc.

This is the one-page summary:
https://www.softwaregems.com.au/Documents/Student_Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
For your purposes (scientific comparative eval), please look at the other files in that directory. Eg. the Query Plans: that is the text version via the server [isql], we normally use an IDE (no one codes SQL without one), which provides a pretty graphical version.

----

That was all client-side stats, fairly simple, but excellent for debugging; performance improvement; etc, for a developer. If you mean serious Statistics, taken from the server directly while a benchmark (or a problem situation) is running, we have a wealth, and several methods of capture and acquisition. Again, depending on the tool used, either text only, or sexy graphics. You can google to find such tools. Even the DBA tool provides basic Stats.

I will lead you through from what we get from the server, to what we can do with it, in increments.

1. This is the lowest level, directly from the server, no 3rd party tool required. I require this to be done for any server that I attend, 24 x 60 mins or 48 x 30 mins, daily. This is an ancient format, it cannot be changed for backward compatibility reasons. Look, do not read, get an idea of the stats the server spews.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/SEQUOIA_120806_sysmon_09

2. Most DBAs cannot read that. Guys like me can, but it is difficult to find things, and the organisation is what the server was since 1985 ... enhanced over decades, not logical from a resource perspective. In hindsight, metric names are horrible. So DBAs have a script to ETL that into something meaningful and readable. High end consultants have better scripts than DBAs, and theirs is better than newbies, etc. Mine is normalised, meaning highly organised, and the metric names are consolidated (while maintaining a realtion to the server). Further it captures all resources that I have setup in the server (ie. everything beyond the installation CD), to whatever degree those Resources are organised: hierarchies and names, not numbers. Here is one day. Very useful looking for shapes and patterns (not reading figures), rather than via [1]. Again, just look, don't read. Capture mean what we captured from the server.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Capture.pdf

3. Here is one day divided into CaptureGroups (server load categories by Time). This is the largest teaching hospital in the US, their gigantic Health Care system, I provided Level 3 support for their Sybase:
- daytime (online user activity, all hospital staff)
- night (batch jobs, app maintenance);
- DBCC & Backup (server level database maintenance).

Now it might be worthwhile reading
- At this level, I provide additional metrics (ie. computed from [1]; whatever the storage girls and unix boys give us (eg. vmstat)
- The [Div]ided column gives the metric (numerator) divided by the parent metric (denominator, easily identified by the indentation), as a %age or Rate (per sec)
- The [RUt] column is Resource Utilisation. The parent metric (denominator) here is the parent Resource (the set above, not via indentation in the immediate set). The application is vertical, the comparison is horizontal. KPI for load distribution.
- The [SUt] column is Schedule Utilisation. Schedule means whatever the main column sets are in the report (this DBA set up 3 fractions for the day and called the report CaptureGroup). The application is horizontal, expressed as a number, not %age (because we already have %ages, and a column full of %ages will carve up the report visually)
:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20CaptureGroup.pdf

Or a Trend report. This one shows growth by week.
- The Delta column show the change against the first date.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Trend.pdf

4. That's fine for technical people, but for management or to provide proof in my consulting assignments (performance tuning; determine a problem; before vs after; etc), we need pretty charts. Once [3] is obtained, there is no limit to the type and number of charts, here are a few. Just a couple of keystrokes to get [3] which is in CSV format, into Excel or Numbers.
http://www.softwaregems.com.au/Documents/Documentary%20Examples/sequoia%20091019%20Server%20Public.pdf

Before/After I diagnosed and fixed a SAN problem (way beyond my responsibility, but the storage team could not find it)
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20Xact%20Lock.pdf

> Btw, Sybase was acquired by SAP. Does it still exist as a product?

Yes and yes. It is wholly owned by SAP, minus a small share to key engineers who had shares before the acquisition. It runs as a separate company. Sybase is not just one flagship product, there are Replication Servers; PowerBuilder; and a few other things.

The purpose of SAP acquiring Sybase needs to be understood. SAP is a horrible RFS suite of mixed products that have poor interfaces (no central database, let alone a Relational one). Sybase was purchased because it is the fastest and the easiest to configure [the server] for a particular purpose. The first set of changes were all to do with RFS types files running faster. The acquisition therefore meant a new captive customer base. So it is in a very secure position: 95% of the financial markets (before the acquisition) and progressive colonisation of SAP customers. Unfortunately, SAP is not doing much marketing outside their customer base.

Cheers
Derek

Nicola

unread,

Jan 7, 2020, 1:29:38 PM1/7/20

On 2020-01-07, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> Ok. The one thing left, that I am hung up about, in both those code
> segments, is this. Why the GROUP BY, is it required to get the
> string_agg() to work ?

Yes. See an example below.

> And separately, do you agree, all things considered (as per
> discussion, eg. coding while observing the SQL Standard; portability;
> directness), coding the above using a recursive Function would be
> simpler, more straight-forward, etc.

Agreed, let us stick to standard SQL. Then, I would use the recursive
function from my previous post.

> I said
> that anything that provides a user-requirement in a way that is
> different to the way that SQL does provide, is anti-SQL.
>
>> string_agg() is just an aggregate function operating on
>> strings.
>
> It is not an aggregate. It manufactures a list of items, with a given
> separator. It is the normal result set from a SELECT, which is
> multiple-row single-column, converted into CSV and delivered as
> single-row single-column.

An aggregate maps a set of values from different rows into a single
value, and string_agg() does exactly that:

PartDescription
PartCode Keyword
P1 dark
P1 smooth
P2 square
P2 glassy
P2 opaque

select PartCode, string_agg(Keyword, ', ') as Description
from PartDescription
group by PartCode;

PartCode Description
P1 dark, smooth
P2 square, glassy, opaque

is conceptually no different from a standard aggregate like sum(), e.g.:

Assembly
AssemblyCode ComponentCode Quantity
P1 P2 3
P1 P3 5

select AssemblyCode as PartCode, sum(Quantity) as Quantity
from Assembly
group by AssemblyCode;

PartCode Quantity
P1 8

> An example of an aggregate string function is HashBytes( <long_string>
> ), where <long_string> is usually the whole row, all columns converted
> to CHAR(), for the purpose of determining, when compared to as
> previously stored image, if it has changed.

That's a different class of operations, because it works on one row at
a time. It wouldn't be affected by GROUP BY.

> Further, it is anti-logical, because the logical way (FOPC -> RA ->
> SQL) is to append each increment to a string, which is a recursive
> function.

??? Also sum() could be computed with a recursive function, by adding an
increment to a number. That does not make sum() anti-logical.

It's curious that you mention FOPC and RA in this context, because they
are expressively equivalent (Codd's Theorem) and it is widely known that
they cannot express recursion.

> As a result of this exchange, I think the rule to give my followers
> who are stuck in PooGross land, all things considered, is:
> - do not use recursive queries ( string_agg() )
> - use recursive functions (don't be lazy)

Advice taken. But I think that recursive queries (by which I understand
that you are talking specifically of WITH RECURSIVE queries) in
a relational query language are a good idea. I don't particularly like
SQL's syntax, though. And the implementation in some systems is less
than ideal. Those may be enough reasons (not the only ones) to prefer
functions over WITH RECURSIVE.

A function taking only a PartCode as an argument (that is the signature
you showed us)? How would you make it recursive without keeping some
state?

> Btw, PartCode is CHAR, not INT. *Code is a naming convention meaning
> CHAR, short name. *No or Num* for INT.

Ok.

>> How do you express the condition used to check for self-loops (e.g.,
>> rows with value (n,n))? It can't be simply CHECK (NodeNo <>
>> ParentNodeNo), because you must allow at least one (or exactly one) such
>> row.
>
> Hang on. Before I answer the next question. Why precisely do we need
> such a row ?

In Node(NodeNo, ParentNo, Name)? I am assuming that you recognize the
root of the hierarchy as a tuple of the type (n,n,''), for some integer
n. Please clarify if that is not the case.

>> I am curious to know what, in your experience, is an efficient way
>> to perform that check.
>
> On hold.

>> > What about recursive stored procs, which came first historically (1985
>> > for RDBMS, 1976 for pre-Relation DBMS) ?
>>

>> Why not? [...] choose what looks simpler (and probably most elegant)
>> in the first place. [...] Reformulate in a different way if you find

>> that you need to improve X and the reformulation improves X—where
>> X may be portability, speed, concurrency, etc.
>
> For all categories listed except concurrency, agreed. I would add
> Logical in front.
>
> For concurrency (low contention, high concurrency), strongly rejected.
> The whole collection of related items is called OLTP Standards, ACID
> Transactions being just one item. It is a conceptual technology, that
> is in two parts. It is a total prevention of preventable problems,
> rather than curative:
> 1. the required structures must be implemented in the database (data
> model, at the stage where it progresses to Physical)

Yes.

> 2.0. all Transactions have to be designed with OLTP considerations in
> mind, the smallest possible Logical Unit of Work, for the given data
> model.

Yes.

> 2.1 all Transactions must be written using the OLTP Code Template.
> Means 4 mandatory code blocks and a demanded compilation which means
> a Stored Proc per Xact.
> (This is a quick definition, if you have questions, please open
> a separate thread.)

Ok.

> Since all Transaction sp's have the required structure (Template),
> there is nothing to do, nothing to improve, in the area of
> concurrency. That the Template has been faithfully implemented; that
> it matches intent per the data model, is a matter of Code Reviews by
> Peers. Sure, at load testing only, and not before, contention
> problems will be exposed, but always (100% of my experience over 44
> years fixing problems in this specific area) it is due to failure of
> [1][2.0][2.1], and always eliminated by proper implementation thereof.

Ok, I have to trust your experience here.

>> > The Query Optimiser is none of
>> > that, if it exists at all. Performance is not predictable. Often
>> > a developer will try two or three methods just to evaluate the
>> > performance difference.
>>
>> I would love to see some comparative evaluation.
>
> I can't give you exactly what is called for here, but we do that in
> advanced courses, to kill an argument along those lines.

I meant publicly available resources. There doesn't seem to be many.
Anyway, I have downloaded your linked documents; I will read them (it'll
take a while!) and come back to this group with further questions, if
any.

>> Btw, Sybase was acquired by SAP. Does it still exist as a product?
>
> Yes and yes.

Just to be clear, is it the product currently branded as "Sybase ASE"?

https://www.sap.com/products/sybase-ase.html

Am I correct that "Sybase IQ" is a totally different product?

Nicola

Derek Ignatius Asirvadem

unread,

Jan 9, 2020, 4:43:02 AM1/9/20

> On Wednesday, 8 January 2020 05:29:38 UTC+11, Nicola wrote:
> > On 2020-01-07, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> >
> > I said
> > that anything that provides a user-requirement in a way that is
> > different to the way that SQL does provide, is anti-SQL.
> >
> >> string_agg() is just an aggregate function operating on
> >> strings.
> >
> > It is not an aggregate. It manufactures a list of items, with a given
> > separator. It is the normal result set from a SELECT, which is
> > multiple-row single-column, converted into CSV and delivered as
> > single-row single-column.
>
> An aggregate maps a set of values from different rows into a single

> value, and string_agg() does exactly that [...]

(I know all that, I know exactly what string_aggr() does, I was chasing only the GROUP BY, and the incorrect name.)

Misunderstanding. I am saying the name, the word /aggregate/ is wrong, because it already has an established (a) English and (b) technical meaning. What it is doing is not /aggregating/ a list of values from multiple rows, but /appending/. If it were named string_app(), I would have no problem.

Btw, the name (and the exact same function) comes from Oracle, which has had string_aggr() for about 30 years. One can guess the lineage of the PooGross developers. Eg. all the extensions in Sybase ASE have the name of the Unix function it performs. And no, we don't have an string_aggr() equivalent, we have recursion.

> is conceptually no different from a standard aggregate like sum(), e.g.:
>
> Assembly
> AssemblyCode ComponentCode Quantity
> P1 P2 3
> P1 P3 5
>
> select AssemblyCode as PartCode, sum(Quantity) as Quantity
> from Assembly
> group by AssemblyCode;
>
> PartCode Quantity
> P1 8

I don't know what your definition of /concept/ is, and I would rather not argue about it.

In precise English & technical terms, the above is a SUM(), numeric values cane be SUMmed, strings cannot be SUMmed. What string_aggr() does is APPEND, not SUM. Other string [not numeric] functions include soundex( <str_1>, <str_2> ); sub(); gsub(); etc.

> > An example of an aggregate string function is HashBytes( <long_string>
> > ), where <long_string> is usually the whole row, all columns converted
> > to CHAR(), for the purpose of determining, when compared to as
> > previously stored image, if it has changed.
>
> That's a different class of operations, because it works on one row at
> a time.

Yes. (I was offering an operation that works on somewhat more than a single column.)

> It wouldn't be affected by GROUP BY.

That is a different thing altogether. That appears to be demanded for the string_aggr() to work. Although the function itself; and the name, are non-logical,
In it not a genuine GROUP BY,
it is saying, use only this PartCode
It is more of a WHERE
I would prefer FOR PartCode
I understand why the GROUP BY is required for the function to work. Pity the code segments into which that is deployed:

SELECT
____PartCode,
____[...]
____Description = (
______select string_agg(Keyword, ', ') as Description
________from PartDescription INNER
________WHERE INNER.PartCode = OUTER.Partcode
________group by PartCode;
________)
____[...]
____WHERE ...
____FROM Part OUTER
____[...]

> > Further, it is anti-logical, because the logical way (FOPC -> RA ->
> > SQL) is to append each increment to a string, which is a recursive
> > function.
>
> ??? Also sum() could be computed with a recursive function, by adding an
> increment to a number.

Yes. (Actually we can do precisely that (and any other sort of intent that is either APPEND or INCREMENT) in Sybase, without weird functions or code.

> That does not make sum() anti-logical.

Yes. It doesn't. I did not say that because SUM() can be obtained via increments, that it is non-logical.

> It's curious that you mention FOPC and RA in this context, because they
> are expressively equivalent (Codd's Theorem) and it is widely known that
> they cannot express recursion.

Of course. That is why I stated, from the beginning, recursion is NOT IN SQL, it is a program-execution function, which can only be in the platform, the implementation of SQL. Technically that means:
Platform{ SetProcessing | ProceduralProcessing | Recursion } :: FOPC :: RA :: SQL
wherein SQL proper has only { SetProcessing } and no recursion.

We have had recursion since the first version of Sybase & DB2. Recursion is now recently in the SQL Standard. Pretty stupid, because of the above.

> > As a result of this exchange, I think the rule to give my followers
> > who are stuck in PooGross land, all things considered, is:
> > - do not use recursive queries ( string_agg() )
> > - use recursive functions (don't be lazy)
>
> Advice taken. But I think that recursive queries (by which I understand
> that you are talking specifically of WITH RECURSIVE queries)

Yes.

> in a relational query language are a good idea.

I reject the idea entirely. YOU have given the reason why, a few lines above, and I agree.

> I don't particularly like
> SQL's syntax, though. And the implementation in some systems is less
> than ideal.

We won't name names.

> Those may be enough reasons (not the only ones) to prefer
> functions over WITH RECURSIVE.

Agreed. And ... both Oracle & PooGross excrete bricks through a kiln when column-level functions are used. Just wait until they add column-level and row-level security; encryption; etc. Watch a slow horse to market turn into an elephant on a foot-stool.

???
Of course I maintain state (progress in the list).
Checked
Sorry, my mistake ...
CREATE FUNCTION
Part_Description_fn (
@PartCode PartCode, -- PK for which Desc is sought
@PartDescription _DescMax = "" -- CURRENT Built Desc
)
RETURNS CHAR(255)
-- Asterisk in the 1st char indicates error
-- - PartDescription exceeds 255
AS
...

> >> How do you express the condition used to check for self-loops (e.g.,
> >> rows with value (n,n))? It can't be simply CHECK (NodeNo <>
> >> ParentNodeNo), because you must allow at least one (or exactly one) such
> >> row.
> >
> > Hang on. Before I answer the next question. Why precisely do we need
> > such a row ?
>
> In Node(NodeNo, ParentNo, Name)? I am assuming that you recognize the
> root of the hierarchy as a tuple of the type (n,n,''), for some integer
> n. Please clarify if that is not the case.

No. As a rule, I have no fudges. Ever.

There is just one root for the entire Node tree. The RM does not permit, and we do not have circular references and therefore we do not need "deferred constraint checking". (Note that the TTM Gulag teaches you to place a sabotaging circular reference practically everywhere. To justify their "need" for "deferred constraint checking".)

During the database build, which is a rather large script that is sequenced, right after the
CREATE TABLE Node
command, before any of the
ALTER TABLE Node ADD CONSTRAINT ...
do:
INSERT Node VALUES ( 0, 0, CHAR(0) )
now proceed with the
ALTER TABLE Node ADD CONSTRAINT ...
one of which is:
ALTER TABLE Node
ADD CONSTRAINT Node_NE_Parent_ck CHECK (
NodeNo <> ParentNodeNo
)
another of which is:
ALTER TABLE Node
ADD CONSTRAINT Node_NotCircular_ck CHECK (
CHARINDEX( dbo.Node_Path_fn( NodeNo, 1 ), CONVERT(VARCHAR,NodeNo) ) = 0 AND
LEFT( dbo.Node_Path_fn( NodeNo, 1 ), 1 ) != "*"
)
dbo.Node_Path_fn returns asterisk in the first byte position if the Path exceeds 255.

> >> I am curious to know what, in your experience, is an efficient way
> >> to perform that check.
> >
> > On hold.

Answered, I think. Except for a possible discussion re the hoops you have to jump through when you are used to the prohibited circular references, and the [syphilis at the brothel] "deferred constraint checking".

> > Since all Transaction sp's have the required structure (Template),
> > there is nothing to do, nothing to improve, in the area of
> > concurrency. That the Template has been faithfully implemented; that
> > it matches intent per the data model, is a matter of Code Reviews by
> > Peers. Sure, at load testing only, and not before, contention
> > problems will be exposed, but always (100% of my experience over 44
> > years fixing problems in this specific area) it is due to failure of
> > [1][2.0][2.1], and always eliminated by proper implementation thereof.
>
> Ok, I have to trust your experience here.

That is not very scientific. As well as guaranteeing minimum contention, I guarantee zero deadlocks. New developers always need to see the proof. But before we get to the Template; why the Template; why the four code blocks, we have to know that the circus master can /create/ a deadlock, in order to know how to prevent one. Separate thread if interested.

People who do not understand Two Phase Locking (we have that since the 1960's); its efficiency; etc, and find out the hard way //ooh, ooh, ooh, I have a deadlock !!!//, implement monstrosities such as Oracle MVCC, now PusGrus MVCC. Without ever understanding that they themselves created the deadlock. There are no deadlocks on the CD, the distribution media, right ?

> >> > The Query Optimiser is none of
> >> > that, if it exists at all. Performance is not predictable. Often
> >> > a developer will try two or three methods just to evaluate the
> >> > performance difference.
> >>
> >> I would love to see some comparative evaluation.
> >
> > I can't give you exactly what is called for here, but we do that in
> > advanced courses, to kill an argument along those lines.
>
> I meant publicly available resources. There doesn't seem to be many.

No. But you can get that from SAP/Sybase/Sales, and from [ex-] Partners like me.

> >> Btw, Sybase was acquired by SAP. Does it still exist as a product?
> >
> > Yes and yes.
>
> Just to be clear, is it the product currently branded as "Sybase ASE"?
>
> https://www.sap.com/products/sybase-ase.html

Yes. That is the flagship product. When MS stole Sybase, it too has Transact-SQL[TM] and dataserver[TM}, etc. Following that the two products grew in vastly different directions. So it was re-branded: Sybase Adaptive Server Enterprise.

It comes in three enterprise flavours, which really means
- small dev (8 core limit, IIRC 50 connections)
- standard enterprise (no limits)
--- on the large unix boxes, price depends on the box
- 100% Non-Stop (survives any crash; change hardware while running)
and one package (std enterprise plus Replication [to/from any platform], blah, blah)

> Am I correct that "Sybase IQ" is a totally different product?

Yes. As I said, there is a whole suite of products.
- Sybase Intelligent Query is a heavy duty Data Warehouse server. Columnar indices. Acquisition of a finished product.
- Sybase SQL Anywhere (previously Watcom SQL) is a simple, client+server on a single PC solution. Not 100% SQL compliant, but has a devout following.
- many more. PowerBuilder. PowerDesigner (ex Oracle not as good as ERwin)

The only product we are interested in is Sybase ASE.

Cheers
Derek

Nicola

unread,

Jan 9, 2020, 6:04:56 AM1/9/20

On 2020-01-09, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> In Node(NodeNo, ParentNo, Name)? I am assuming that you recognize the
>> root of the hierarchy as a tuple of the type (n,n,''), for some integer
>> n. Please clarify if that is not the case.
>
> No. As a rule, I have no fudges. Ever.
>
> There is just one root for the entire Node tree. The RM does not
> permit, and we do not have circular references and therefore we do not
> need "deferred constraint checking". (Note that the TTM Gulag teaches
> you to place a sabotaging circular reference practically everywhere.
> To justify their "need" for "deferred constraint checking".)
>
> During the database build, which is a rather large script that is
> sequenced, right after the
> CREATE TABLE Node
> command, before any of the
> ALTER TABLE Node ADD CONSTRAINT ...
> do:
> INSERT Node VALUES ( 0, 0, CHAR(0) )

So, you do have a row of the form (n,n,...) (with n=0) to denote the one
root of the tree.

> now proceed with the
> ALTER TABLE Node ADD CONSTRAINT ...
> one of which is:
> ALTER TABLE Node
> ADD CONSTRAINT Node_NE_Parent_ck CHECK (
> NodeNo <> ParentNodeNo
> )

Now, wait. Does Sybase really accept that? Regardless of the answer,
that *is* anti-logical:

- Declared constraint in table definition: NodeNo <> ParentNodeNo
- Current instance of the table:

Nodeno ParentNodeNo ...
0 0 ...

Tell me what you want, that CONSTRAINT CHECKs are enforced at update
time, that the constraint did not exist when the row was inserted, etc.
but what you show is a dangerous behavior. You cannot trust that
a declared constraint is indeed satisfied by your instances any more.

The system must reject the addition of a constraint that the current
instance violates, period.

Nicola

Derek Ignatius Asirvadem

unread,

Jan 9, 2020, 9:57:46 AM1/9/20

> On Thursday, 9 January 2020 22:04:56 UTC+11, Nicola wrote:
> > On 2020-01-09, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> >
> > There is just one root for the entire Node tree. The RM does not
> > permit, and we do not have circular references and therefore we do not
> > need "deferred constraint checking". (Note that the TTM Gulag teaches
> > you to place a sabotaging circular reference practically everywhere.
> > To justify their "need" for "deferred constraint checking".)
> >
> > During the database build, which is a rather large script that is
> > sequenced, right after the
> > CREATE TABLE Node
> > command, before any of the
> > ALTER TABLE Node ADD CONSTRAINT ...
> > do:
> > INSERT Node VALUES ( 0, 0, CHAR(0) )
>
> So, you do have a row of the form (n,n,...) (with n=0) to denote the one
> root of the tree.

Said so. Nothing to hide.

> > now proceed with the
> > ALTER TABLE Node ADD CONSTRAINT ...
> > one of which is:
> > ALTER TABLE Node
> > ADD CONSTRAINT Node_NE_Parent_ck CHECK (
> > NodeNo <> ParentNodeNo
> > )
>
> Now, wait. Does Sybase really accept that? Regardless of the answer,
> that *is* anti-logical:
>
> - Declared constraint in table definition: NodeNo <> ParentNodeNo
> - Current instance of the table:
>
> Nodeno ParentNodeNo ...
> 0 0 ...
>
> Tell me what you want, that CONSTRAINT CHECKs are enforced at update
> time, that the constraint did not exist when the row was inserted, etc.
> but what you show is a dangerous behavior. You cannot trust that
> a declared constraint is indeed satisfied by your instances any more.
>
> The system must reject the addition of a constraint that the current
> instance violates, period.

Instead of telling me what the genuine SQL platforms SHOULD do, from your perspective, based on experience with an anti-SQL piece of dog's meat, why don't you find out (a) what SQL requires, and (b) why we might want the operation that the SQL platforms do.

What you don't realise is, you have an insane NONsql suite. It is ant-logiocal. It forces you to tear your hair out while doing ordinary pedestrian tasks.

You are also used to very small databases. Do you think when I am rebuilding a couple of tables overnight in PRODUCTION, say a few billion rows, that I should spend a week instead ? Just one eg. If I ad a few columns to a table (which has no effect on the existing app code, only affects the new app code that deals with the new columns), the table as is does not have the constraints on the new column. It would be ROLFLMAO hilarious if you asked me to make the existing billions of rows comply with a constraint they never had.

The principle is, the database was and is, compliant to all constraints. When you add a new one, like adding a new law to society, you cannot backdate it by default. That intent itself is criminal and non-logical.

No. You can walk up to a populated table and just add a new constraint. It affects the rows that are INSERTED or UPDATED. If you want the entire table to conform to the new constraint, that too is simple:
CREATE TABLE Foo_New (...)
ADD CONSTRAINT -- all
INSERT Foo_New
SELECT [...]
FROM Foo

Probably the most common and straight-forward task for a DBA, is to rebuild a table at the physical level (to de-frag it; to place it on another disk to reduce disk contention; to allow new inserts at a different physical location; etc) the list is a least 20 items long). For that we do a "fast" rebuild:
- BulkCopy rows OUT to file (multiple files for multiple partitions)
- disable all FK constraints to/from the table
- drop the table
- create table, giving new locations; new partition declarations; etc
- BulkCopy rows IN from files (multiple threads; etc)
- create indices
- add constraints

Now at this stage, given your consternation, you will scream //that is not right//. But wait, the data was compliant when I bulk-oopied it out; the file was secure; and the data is still good when I bulk-copied it in. It would be ROLFLMAO hilarious if you asked me to RE-CHECK the existing billions of rows to comply with the constraints that they already complied with.

Do you think PRODUCTION is going to wait for the CHECK constraints to execute over billions of rows ? Sure, because that is all you know. Not in the real world, they will have your termination cheque ready before the command finishes executing.

Sybase (DB2; MS) handle (eg) changes to datatypes; changes from say 16 partitions to 24; and that sort of thing beautifully, no extra work for the DBA. But let me assure you, as a consultant, I do an awful lot of clever things on the data file while outside the server.

----

It is not Sybase. It is SQL: DB2; Sybase; MS/SQL. Informix (though it does not declare compliance). Oracle is not honestly SQL compliant, I don't know how it handles this particular item. If after considering my response, you still think it is wrong, sure, go and fight ISO/ANSI/IEEE, not me, I am just the messenger.

You mickey mouse anti-SQL suite does all sorts of things that are cute and cuddly for an academic with a tiny database. It does nothing at all for large production systems, none of what an SQL platform does, none of the utilities to do ordinary database maintenance, and a mindset that the creepy thing should execute the new constraint on the existing data //because that would be convenient for a tiny database// and utterly ridiculous for data of any quantity. This is the real world out here. AFAIC, PusGroop is totally inadequate for any large data size, or any large number of online users, this point alone being just one of the mickey mouse killer issues.

You think I have a bias against pig poop. No, it is a natural revulsion. And every single thing I have said is evidenced fact. And I have not enumerated the anti-SQL issues in the beast.

Wait until you find out about just what a horror MCVV is, or what PusGruss does with "transactions", vs what we have been doing since the 60's, before Relational, and after Relational. You are having a hissy fit now, you will have epileptic seizures when you find out that stuff

----

> The system must reject the addition of a constraint that the current
instance violates, period.

When all is said and done, the only thing that matters is the SQL Standard, what it says about a new constraint and its application. Not what either you or I think.

> but what you show is a dangerous behavior.

Everyday concerns and considerations for a production DBA in a big shop. We get paid according to the level of responsibility we bear, same as a steel worker or a risk analyst.

> You cannot trust that
> a declared constraint is indeed satisfied by your instances any more.

I purposely added the root row before the constraint that would prevent it. Using the known and SQL compliant behaviour of the server. I have to do that on every table that is an hierarchy.

Cheers
Derek

Derek Ignatius Asirvadem

unread,

Jan 9, 2020, 10:25:22 AM1/9/20

> On Friday, 10 January 2020 01:57:46 UTC+11, Derek Ignatius Asirvadem wrote:

Actually, the problem, the difficulty you guys have in accepting the logical and painless way to perform ordinary database maintenance, is menatl. You are so addicted to tearing your hair out, to having a problem that cannot be solved, etc, all due to the "deferred constraint checking" mindset.

First, you do the task the wrong way. Second, you have yourself in a quandary, a knot that cannot be undone. Ooooh the excitement. Ok, fine. But WE don't have to have the same series of errors, or the same consequence.

(
This kind of story happens to me all the time. A couple of years ago I was at a short assignment at a large bank in New Zealand. I got along well with the senior users, and my boss, the IT executive asked me to do any small tasks the users asked, no need to get approval from him. No worries. In-between server and database rebuild tasks, an Auditor asked me if I could get a certain report out of the server. I said, sure, give me 15 minutes. Two hours later, the boss came around, took me out to lunch and gave me an earful. Turned out the IT guys had said that that report was not possible. For five years. He told me not to do any more jobs for the users. When we got back from lunch, all the senior users were in the boardroom with all the IT guys. There was gnashing of teeth, wailing like widows. None of the IT guys spoke to me after that.
)

You guys have this quandary, this knot from hell inside your non-server from hell, on every single instance of "deferred constraint checking".

You guys do not know any other way. And when you find out, it makes you livid, at the hundreds of hours you have wasted with that quandary, so many many times. But you focus that anger at the messenger, not at the academics who lied to you, that are the real cause of your quandary, both the technical one and the mental one.

Cheers
Derek

Nicola

unread,

Jan 9, 2020, 11:39:55 AM1/9/20

On 2020-01-09, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> > now proceed with the
>> > ALTER TABLE Node ADD CONSTRAINT ...
>> > one of which is:
>> > ALTER TABLE Node
>> > ADD CONSTRAINT Node_NE_Parent_ck CHECK (
>> > NodeNo <> ParentNodeNo
>> > )
>>
>> Now, wait. Does Sybase really accept that? Regardless of the answer,
>> that *is* anti-logical:
>>
>> - Declared constraint in table definition: NodeNo <> ParentNodeNo
>> - Current instance of the table:
>>
>> Nodeno ParentNodeNo ...
>> 0 0 ...

>> [...]

>> The system must reject the addition of a constraint that the current
>> instance violates, period.
>
> Instead of telling me what the genuine SQL platforms SHOULD do, from
> your perspective, based on experience with an anti-SQL piece of dog's
> meat, why don't you find out (a) what SQL requires, and (b) why we
> might want the operation that the SQL platforms do.

ISO SQL:2011 (the draft I have at hand), §4.18.3.4:

"A table check constraint is satisfied if and only if the specified
<search condition> evaluates to True or Unknown *for every row* of the
table to which it applies." [emphasis mine]

FYI, you can do what you describe in PostgreSQL as well, but better:

alter table Node
add constraint Node_NE_Parent_ck
check (NodeNo <> ParentNodeNo) NOT VALID;

The default is sane (if you add a constraint, usually you want every
instance to conform to it), but you may skip validation with NOT VALID.
No performance penalty, as you ask: the constraint will be enforced only
for subsequent inserts or updates.

Second, the constraint is marked as such in the table definition, e.g.:

[table definition here]
Check constraints:
"Node_NE_Parent_ck" CHECK (NodeNo <> ParentNodeNo) NOT VALID

so you know that it is not necessarily satisfied by each row.

Third, if you want you may validate it at any subsequent time with:

alter table Node validate constraint Node_NE_Parent_ck;

In your example, that would get you an error because of the first
insertion:

ERROR: check constraint "Node_NE_Parent_ck" is violated by some row

and the constraint will remain NOT VALID.

Granted, there are use cases when this feature comes in handy, like
adding a constraint to a huge table. But the idea is that you'd skip the
constraint because you *know* that it's valid for the existing data.
There are exceptions, of course ("hey, the codes from now on will be
only numeric; but the alphanumeric codes are still in use and should not
be changed"). But if that feature can be avoided, IMO it's better to
avoid it.

In your example, this would be a cleaner solution:

alter table Node
add constraint Node_NE_Parent_ck
check (NodeNo <> ParentNodeNo or NodeNo = 0);

This could be added before inserting any data and it *declares*
explicitly that we don't care about node 0.

Nicola

0 new messages