Ok. The one thing left, that I am hung up about, in both those code segments, is this. Why the GROUP BY, is it required to get the string_agg() to work ? The string_agg() being one anti-SQL trick, and the technique required to get it to work being part of the trick.
And separately, do you agree, all things considered (as per discussion, eg. coding while observing the SQL Standard; portability; directness), coding the above using a recursive Function would be simpler, more straight-forward, etc.
> > 2. Separate point. That is anti-SQL.
>
> I don't see how: string_agg() is just an aggregate function operating on
> strings. Unless anything except count(), sum(), avg(), min() and max()
> for you in "anti-SQL".
>
> Not SQL-standard compliant, I agree. AFAIK, there's no string_agg() in
> the standard.
Your second comment destroys your first comment.
If there is an SQL way to preform some function, and your mickey mouse suite performs said function a different way, it is anti-SQL. If that anti-SQL way blocks portability to an SQL platform, then it is purposely making sure that you invest in the suite *and* that you can't move out of it. Like catching syphilis at the brothel: the girls say, la la la la la la, now you can't go anywhere else. PusGross has many such features, eg. ENUM; the idiotic and anti-Relational Deferred Constraint Checking. It is entrapment on a grand scale.
I will respond to this next item as a generic technique for anti-logical argumentation, because if you were conscious of what you have done, I trust you would not do it.
> Unless anything except count(), sum(), avg(), min() and max()
> for you in "anti-SQL".
The item at issue is whether the string_agg() function is anti-SQL (my charge), or not. The scientific, logical argument is to argue that point squarely. If not, then whatever argument is thrown up, is not square, not logical, perhaps manipulative, but in any case, does not lead to resolution of the issue. The actual result is, it re-frames the argument as something that it is not, and makes the person sound stupid.
The item at issue is not that since it is a function, therefore it lies in the category of all functions, and since it is not one of the core SQL functions, therefore I must be (through this confected chain of anti-logical propositions) opposing all functions that are not core SQL functions.
I did not say that. That is your interpretation. Not mine. I said that anything that provides a user-requirement in a way that is different to the way that SQL does provide, is anti-SQL.
> string_agg() is just an aggregate function operating on
> strings.
It is not an aggregate. It manufactures a list of items, with a given separator. It is the normal result set from a SELECT, which is multiple-row single-column, converted into CSV and delivered as single-row single-column.
An example of an aggregate string function is HashBytes( <long_string> ), where <long_string> is usually the whole row, all columns converted to CHAR(), for the purpose of determining, when compared to as previously stored image, if it has changed.
Further, it is anti-logical, because the logical way (FOPC -> RA -> SQL) is to append each increment to a string, which is a recursive function.
As a result of this exchange, I think the rule to give my followers who are stuck in PooGross land, all things considered, is:
- do not use recursive queries ( string_agg() )
- use recursive functions (don't be lazy)
Yes. Except that I would do that in one function, not two. The same way you did Node_Path_fn.
Btw, PartCode is CHAR, not INT. *Code is a naming convention meaning CHAR, short name. *No or Num* for INT.
> > You might call that /making it easy for the developer/, which means
> > guaranteeing a re-write when the app justifies migration to an SQL
> > platform, and guarantees a headache for any maintenance developer up
> > to that point.
>
> Sure, portability-wise it's always better to stick to standards.
> Unfortunately, migration from one DBMS to another is unlikely to be
> hassle-free even if one sticks to strict ISO SQL.
That is not logical. No one is saying that porting from one platform to another is hassle-free. Everyone understands that portability is a Good Thing to strive for, and Standards are predicated on that. If you use the 2-char ISO CountryCode, you will be far better off, for many reasons, than using a roll-your-own CountryCode. In any given platform, which necessarily has Extensions to SQL, if you stick to ISO SQL in preference over some Extension, you will be far better off than if you prefer the Extension over ISO SQL.
In the high end of the market we have a different problem. Eg. IBM and Sybase have had, from the earliest days of SQL, true SQL platforms. With the full gamut of capability required for a DBMS (note that is not RDBMS), because what a DBMS was, was already strongly established. But since many of those capabilities were best delivered /inside/ SQL, such that the developer was NOT forced to exit an SQL code segment to execute that function and then jump back in again, to continue in SQL, they were delivered as Extensions to SQL. Transact-SQL is the best example of that, it was so good that MS stole it, and Informix and even IBM implemented some of its functions.
Years later, sometimes decades later, the Standards caught up, then ANSI, now ISO, and declared a particular SQL syntax for a particular function that we had implemented earlier in a proprietary syntax. And were not about to change, or rewrite code. A simple example would be regex on strings, anywhere in the SQL code. We had that since 1985, the ISO SQL declared it 1999. Or row pattern matching in 2005, in ISO in 2017 under a different name. Or COALESCE() when they finally figured out how to deal with NULL, to provide what we had for 30 years as ISNULL(). Sybase just provided the new syntax to comply with the Standard.
In the low end of the market, Oracle; MyNONsql; PooGooNONsql, it is the other way around. The suites not only did not have SQL, they did not have core features of a DBMS. We have had ACID Transactions from 1985 (IBM had it earlier via System/R, proprietary SQL, etc), PostGross had nothing in its initial versions. Some 20 years after the initial version, some thirty five years after we had it, PostGross delivered the pathetic "functions are now transactional" over the fragile MVCC philosophy.
It may be another ten or twenty years before you guys get genuine OLTP, genuine ACID Transactions. Whenever that happens, you will have to rewrite great slabs of code, because the structure of the code will have to change. Even worse for people who were clueless re Transactions, who delivered CRUD or some other filth, who counted on CASCADE. Instead of doing it right from the outset.
But academics love their toys, they will not admit that they are 30 or 40 years behind the commercial market. They rely on the promise: "it is improving, one day it will ..."
The point is, we are worlds apart, SQL vs Anti-SQL. As evidenced in the examples in this and the other thread, which is closing the gap wrt to understanding only.
> > In SQL we can, since 2007, when the facility was declared as part of
> > the SQL Standard. This obtains to Data Integrity that is beyond
> > Domain and Key. Whereas CHECK in the ordinary form checks the
> > attempted new row (INSERT or UPDATE), CHECK ... Function checks:
> > - against some other row or rows in the same table
> > - against some row or rows in another table
>
> Sure, here in anti-SQL world we have discovered that, too. Making
> progress.
Maybe in 2030.
> >> If you want to prevent multiple roots (i.e., more than one record with
> >> NodeNo = ParentNodeNo), I would define a separate constraint.
> >
> > I would not call that a multiple root, that is the smallest, tightest
> > circular reference ! Even one row (not record, notice the slip) is
> > not allowed to do that. Yes, that is a separate simple CHECK
> > CONSTRAINT, placed before the CONSTRAINT that checks for circular
> > references outside the row, in the ancestor list.
>
> Ok. How do you express the condition used to check for self-loops (e.g.,
> rows with value (n,n))? It can't be simply CHECK (NodeNo <>
> ParentNodeNo), because you must allow at least one (or exactly one) such
> row.
Hang on. Before I answer the next question. Why precisely do we need such a row ?
> I am curious to know what, in your experience, is an efficient way
> to perform that check.
On hold.
> > Anti-SQL, as detailed above.
>
> Ah, why did I mention Datalog in the other thread :D
;p
> > What about recursive stored procs, which came first historically (1985
> > for RDBMS, 1976 for pre-Relation DBMS) ?
>
> Why not? SQL, like any language, allows you to express the same thing in
> different ways. IMO, choose what looks simpler (and probably most
> elegant) in the first place; in many cases, that is also the most
> efficient solution. Reformulate in a different way if you find that you
> need to improve X and the reformulation improves X—where X may be
> portability, speed, concurrency, etc.
For all categories listed except concurrency, agreed. I would add Logical in front.
For concurrency (low contention, high concurrency), strongly rejected. The whole collection of related items is called OLTP Standards, ACID Transactions being just one item. It is a conceptual technology, that is in two parts. It is a total prevention of preventable problems, rather than curative:
1. the required structures must be implemented in the database (data model, at the stage where it progresses to Physical)
2.0. all Transactions have to be designed with OLTP considerations in mind, the smallest possible Logical Unit of Work, for the given data model.
2.1 all Transactions must be written using the OLTP Code Template. Means 4 mandatory code blocks and a demanded compilation which means a Stored Proc per Xact.
(This is a quick definition, if you have questions, please open a separate thread.)
Since all Transaction sp's have the required structure (Template), there is nothing to do, nothing to improve, in the area of concurrency. That the Template has been faithfully implemented; that it matches intent per the data model, is a matter of Code Reviews by Peers. Sure, at load testing only, and not before, contention problems will be exposed, but always (100% of my experience over 44 years fixing problems in this specific area) it is due to failure of [1][2.0][2.1], and always eliminated by proper implementation thereof.
> > In mickey mouse suites, the situation is reversed. They worry about
> > performance because they have to.
>
> That sounds more like... than PostgreSQL. I don't want to start flame
> wars :)
It is definitely the Oracle mindset. Often the MS/SQL mindset. Always the newbie (to anything) mindset.
> > The Query Optimiser is none of
> > that, if it exists at all. Performance is not predictable. Often
> > a developer will try two or three methods just to evaluate the
> > performance difference.
>
> I would love to see some comparative evaluation.
I can't give you exactly what is called for here, but we do that in advanced courses, to kill an argument along those lines. Easy to produce from Sybase. I do have many true benchmarks that address one or more issues in a particular database, or cover the issues at a higher level, such that over the decades I have accumulated proof, so that I can declare (eg):
//in Sybase, a Subquery is marginally (inconsequentially) faster than a Derived Table, therefore use whatever is more logical in the particular code segment.//
Or
//In Oracle, never use Subquery, always use a Derived Table instead.//
I can't give you those because they are customer confidential. What I can give you is a public benchmark I ran for Oracle vs Sybase, testing Subqueries and "Inline Views" (Oracle terminology for a Derived Table). The guy I ran it for was a famous Oracle consultant, but he turned out to be ignorant of basic SQL; benchmark methods; Query Plans; etc. I did have a full write-up on well-known forum, (including the back-and-forth which can be ignored), but he has had it removed, because Oracle showed up very badly. Therefore all that is left is the files on my server that were the reference material for the write-up.
The quick desc is, I stated somewhere that Oracle cannot handle subqueries; he attacked and said that was false; he said there was a "better way" to do subqueries (Inline Views) but that subqueries were no problem; I challenged him to a benchmark; being friendly at the time, I had to teach him how to load benchmark tables, etc; we ran the benchmarks on the comparable equipment we had; he was limited to a PC/Windows/OracleNONserver, so I ran mine on MacBookPro/WindowsEmulator/Windows/SybaseServer.
He was dishonest, after Oracle cacked itself, he did not finish the rest of the benchmark or submit his QPs, etc.
This is the one-page summary:
https://www.softwaregems.com.au/Documents/Student_Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
For your purposes (scientific comparative eval), please look at the other files in that directory. Eg. the Query Plans: that is the text version via the server [isql], we normally use an IDE (no one codes SQL without one), which provides a pretty graphical version.
----
That was all client-side stats, fairly simple, but excellent for debugging; performance improvement; etc, for a developer. If you mean serious Statistics, taken from the server directly while a benchmark (or a problem situation) is running, we have a wealth, and several methods of capture and acquisition. Again, depending on the tool used, either text only, or sexy graphics. You can google to find such tools. Even the DBA tool provides basic Stats.
I will lead you through from what we get from the server, to what we can do with it, in increments.
1. This is the lowest level, directly from the server, no 3rd party tool required. I require this to be done for any server that I attend, 24 x 60 mins or 48 x 30 mins, daily. This is an ancient format, it cannot be changed for backward compatibility reasons. Look, do not read, get an idea of the stats the server spews.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/SEQUOIA_120806_sysmon_09
2. Most DBAs cannot read that. Guys like me can, but it is difficult to find things, and the organisation is what the server was since 1985 ... enhanced over decades, not logical from a resource perspective. In hindsight, metric names are horrible. So DBAs have a script to ETL that into something meaningful and readable. High end consultants have better scripts than DBAs, and theirs is better than newbies, etc. Mine is normalised, meaning highly organised, and the metric names are consolidated (while maintaining a realtion to the server). Further it captures all resources that I have setup in the server (ie. everything beyond the installation CD), to whatever degree those Resources are organised: hierarchies and names, not numbers. Here is one day. Very useful looking for shapes and patterns (not reading figures), rather than via [1]. Again, just look, don't read. Capture mean what we captured from the server.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Capture.pdf
3. Here is one day divided into CaptureGroups (server load categories by Time). This is the largest teaching hospital in the US, their gigantic Health Care system, I provided Level 3 support for their Sybase:
- daytime (online user activity, all hospital staff)
- night (batch jobs, app maintenance);
- DBCC & Backup (server level database maintenance).
Now it might be worthwhile reading
- At this level, I provide additional metrics (ie. computed from [1]; whatever the storage girls and unix boys give us (eg. vmstat)
- The [Div]ided column gives the metric (numerator) divided by the parent metric (denominator, easily identified by the indentation), as a %age or Rate (per sec)
- The [RUt] column is Resource Utilisation. The parent metric (denominator) here is the parent Resource (the set above, not via indentation in the immediate set). The application is vertical, the comparison is horizontal. KPI for load distribution.
- The [SUt] column is Schedule Utilisation. Schedule means whatever the main column sets are in the report (this DBA set up 3 fractions for the day and called the report CaptureGroup). The application is horizontal, expressed as a number, not %age (because we already have %ages, and a column full of %ages will carve up the report visually)
:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20CaptureGroup.pdf
Or a Trend report. This one shows growth by week.
- The Delta column show the change against the first date.
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Trend.pdf
4. That's fine for technical people, but for management or to provide proof in my consulting assignments (performance tuning; determine a problem; before vs after; etc), we need pretty charts. Once [3] is obtained, there is no limit to the type and number of charts, here are a few. Just a couple of keystrokes to get [3] which is in CSV format, into Excel or Numbers.
http://www.softwaregems.com.au/Documents/Documentary%20Examples/sequoia%20091019%20Server%20Public.pdf
Before/After I diagnosed and fixed a SAN problem (way beyond my responsibility, but the storage team could not find it)
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20Xact%20Lock.pdf
> Btw, Sybase was acquired by SAP. Does it still exist as a product?
Yes and yes. It is wholly owned by SAP, minus a small share to key engineers who had shares before the acquisition. It runs as a separate company. Sybase is not just one flagship product, there are Replication Servers; PowerBuilder; and a few other things.
The purpose of SAP acquiring Sybase needs to be understood. SAP is a horrible RFS suite of mixed products that have poor interfaces (no central database, let alone a Relational one). Sybase was purchased because it is the fastest and the easiest to configure [the server] for a particular purpose. The first set of changes were all to do with RFS types files running faster. The acquisition therefore meant a new captive customer base. So it is in a very secure position: 95% of the financial markets (before the acquisition) and progressive colonisation of SAP customers. Unfortunately, SAP is not doing much marketing outside their customer base.
Cheers
Derek