Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

No CASCADE in Commercial SQL & Benchmark Considerations

166 views
Skip to first unread message

Derek Ignatius Asirvadem

unread,
Jun 14, 2021, 12:35:13 AM6/14/21
to
I have started a new thread for two issues:
-- CASCADE
-- Benchmark Considerations
that were raised in this thread/post:
https://groups.google.com/g/comp.databases.theory/c/pCxJKwKMsgc/m/ZPCw002wAQAJ

> On Monday, 14 June 2021 at 01:13:54 UTC+10, Nicola wrote:
>
> > Note that one should be used to that [DELETE+INSERT instead of UPDATE] because UPDATE any Key component
> > is not permitted, it must be DELETE+INSERT, in a Transaction of
> > course, that moves the entire hierarchy belonging to the Key.
>
> That is because in your experience, cascading updates (UPDATE ...
> CASCADE), are inefficient, right?

“Inefficient” is putting it mildly, it misses the relevant consideration.

CASCADE is simply not allowed, it isn’t even available on commercial SQL platforms.
Why not ? God help me, an explanation is demanded.
Because high-end SQL Platforms are heavily OLTP oriented.
And CASCADE would be a disaster in an OLTP context.

It is not about my experience, it is about knowledge of the server; server resources; maximising concurrency; minimising contention. Overall configuration and monitoring, first as a general task on every server, and then second, for the particular app and database.

>>
Theoreticians in this space, in their total abdication of responsibility, say that the server is a black box, and that they should not concern themselves with *ANY* implementation concerns. That is like saying the engineer of the car should not concern himself with the exercise of driving. It is the asylum, where total incapacity is allowed, propagated to “science”. It is acceptable only for classroom exercises, utterly bankrupt outside the classroom.

Implementation concerns are the everyday fare for a DBA, and certainly the concerns of a Data Modeller in the latter stages. The black box is a physical reality, it is grey and blue, it fits into a rack in the computer room. All requests to the server pass through it. It is not an abstraction in the classroom that can be ignored.

>>>>
From this post:
__ https://groups.google.com/g/comp.databases.theory/c/Uwc_w8HbBfw/m/trHkR1F8Mx8J

Take “business rule” in the referenced post as your “cascade” here. Cut-paste:


Ok, that means you do not understand the world of implementation.

1. On one side, where the business gives us "business rules", they are not to be taken as implementation imperatives. If taken as such, we would be merely clerks, implementing their requirements, without using the skills that they hired us for. Eg. we would implement a "business transaction" that updated six million rows, that hung the users up for 15 minutes in the middle of the day, and we would take no responsibility, because the business "told us to do it".

1.a Obviously, we do not do that. We exercise the skills we were hired for. Part of which is to implement OLTP Standard-compliant transactions. We do not view the business requirements as imperatives, we view them as initial requirement statements. We work back and forth, such that the requirements are modified, then accepted, and then implemented, such that they do not crash the system; such that the database does not have circular references; etc; etc; etc.

1.b So the example "business transaction" would be converted into a batch job that runs in a loop and executes six million OLTP Standard-compliant single-row transactions. The batch job keeps track of its position; is restartable; etc. So the business gets the requirement they want, but not in the METHOD that they initially stated it. Ie. Just tell me what you want, don't tell me how to do it.

1.c On this one side, in no case is a business rule to be taken as an imperative.

<<<<

>>>>
Also this post:
__ https://groups.google.com/g/comp.databases.theory/c/qqmnhu036FQ/m/RLh9D5Ue1kUJ
please read this section:
__ III - Batch Transaction
<<<<

The most contentious object in the database is the Transaction Log (or its equivalent on MVCC systems, and by any name, in any location).
__ On “2PL” systems, it is a small, hot object, and we try to keep it small (I am not giving you the entire science here, but there are clear, defined rules, such as OLTP Standards).
__ On MVCC systems, it is huge and spread across the entire file space, and the developers are unaware of it, thus trying to constrain its use is not even a remote possibility. Everyone is programmed to pretend that the database is single-user and that they are the single user (the Stonebraker insanity). MVCC does not have ACID.

Note, Functions do not have to be Transactional or NotTransactional, that is a stupid artefact of some pretend-sqls. In ACID, Transactions are declared by the caller, not by the called object.

By virtue of implementation Standards (Software Gems in particular, because we guarantees high concurrency, zero deadlocks), here OLTP Standards, the limit for number of rows affected in an OLTP Transaction is 100, for batch Transactions 500.

So think that out. Write an ACID Transaction (no I am not being silly, I realise you can’t on your non-SQL platform, so you have to think in SQL terms, for an SQL platform). A stored proc, that:
- navigates the *levels* of the Tree,
- and loops,
- executing max 100 INSERTs per BEGIN::COMMIT for the new Key,
- then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.

Whereas CASCADE or the equivalent will update six million rows under the covers and hang up the database (prevent other updaters from progressing) for a week or two, this OLTP Transaction will execute in minutes, without hanging anyone else up.
<<

> I'd like to do some benchmarks.

Great idea. Excellent for obtaining experience in real world issues. But Whoa, that means you have to acquire knowledge of the physical implementation, both what you want to do in your database and how that translates into physical objects, as well as how your pretend-platform implements it. Eg. Indices; types of indices; Transaction Log or additional rows on every touched page plus a garbage collector. Etc, etc, etc.

*Benchmark* generally implies not one but two platforms, and a comparison. So I would plead that you obtain a “2PL” platform for the second. The commercial SQL Platforms all provide a “developer version” which is free and limited (eg. number of simultaneous users or max table size; etc). There is no point is comparing one MVCC monster with yet another MVCC monster, you will learn nothing in the benchmark category (except difference in internals of the two freeware suites).

*Benchmark* strongly implies an ability to monitor the server, all metrics that pertain to performance (in general), and the particular metrics that pertain to the particular benchmark. Freeware has no such thing, so you will be working in the dark, with little idea re what is happening under the covers.

For freeware & Oracle, due to not having a Server Architecture, and instead deploying hundreds or thousands of programs running on Unix, the first avenue of “performance monitoring” and even “problem diagnosis”, is via Unix monitoring:
-- top
-- vmstat

For contrast, take a glimpse of what is available in commercial SQLs. The server is a genuine Server Architecture, so the monitoring regards internal metrics, and very little of Unix (set up correctly once, and forgotten).

0. Monitoring products
Additional licence fee, not shown. Feel free to search the internet.

1. Raw Stats: text, various forms. Free.
Voluminous, not shown. Can be ascertained from the following.

2. Raw Stats Formatted, especially for various types of comparison.
Uses a simple script to produce [2] from [1].
Here, I fixed an error that the SAN configuration boys made. the comparison is Before::After the change: virtually the same as a benchmark, but this is monitoring the production server at the largest teaching hospital in America. With permission of course.

__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf

“Div” is the division of the metric by the relevant denominator.

“Dlt” is the delta, between that column-of-stats and the first column-of-stats, as a percentage, with {-|+|x}.

“RUt” is Resource Utilisation, shown for unit resources, as a percentage of the ResourceGroup (which is shown above the unit resources)

“Engine” is a single Unix Process. The Sybase ASE server comprises a grand total of FOUR Engines in this instance, and serves hundreds of active connections (doctors; nurses; medical images; client histories; etc). In contrast, freeware and Oracle would have hundreds of Unix Processes and no concept of Engine.

The entire server config is set up by me, not the local DBA, whom I support. That means the resource allocation strategy is proprietary, so all the resources are shown, but the names [eg. cache names] have been obfuscated.
- IIRC the freeware, and Oracle, have no concept of a cache, let alone control of it.
- Disks [SAN objects] are the slowest link in the chain, and the hardest to change, so they are set up correctly, once and for all.
--- Until some idiot in the SAN team made an undeclared change, that had a negative effect, that I had to diagnose, without knowledge of the change.

This is a “2PL” server, you may find the Lock Manager stats interesting.

Likewise the Transaction section.
The app is not a very good one, it is stupid, because Parallelism has to be turned off, in order for the app to work. It is not strict OLTP, but good enough in the sense that it does not cause problems. Its ACID profile is hopeless. Nevertheless, via resource management, I have managed to reduce problems and virtually eliminate deadlocks.

3. Executive reports
Pretty pictures for those with a short attention span.
Uses a similar simple script to produce a CSV file from [1], and then Excel/Number to erect the charts.

__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20Xact%20Lock.pdf

__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20DiskGroup.pdf

“Monitor” here is a database name, where a monitoring product [0] stores its info. That the execs use. As you can see, the overhead is huge, and they can’t diagnose anything in months. I do not use such filth, I use the raw stats, and I can diagnose any problem, in minutes. So ignore “Monitor” here, and compare the rest.

“Dbcc” is also a database name, it is storage for Database Consistency Check, which is an overnight batch job, that comes with the product. It performs its namesake, plus it collects table/index/column statistics (as distinct from server statistics).

“Data” means the user database.

“Log” means the Transaction Log.

“APF” means Asynch Pre-Fetch, a read-ahead facility of the server, in which either 8 pages [Extent] or 256 pages [Alloc Unit] is read during index or table scans. Integrated with the caches of course.

All disk access is Asynch, on Raw Partitions, not filesystem files.

> So,
> how do you code them?
>
> start transaction;
> update Grand1_Grand2...GrandN_Child;
> update Grand1_Grand2...Grand(N-1)_Child
> ...
> update Child;
> update Parent;
> commit;

I think I have given adequate directions above, you have to “move” the whole Tree. Please try again with those in mind.

Cheers
Derek

Nicola

unread,
Jun 14, 2021, 1:39:09 PM6/14/21
to
On 2021-06-14, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> So think that out. Write an ACID Transaction (no I am not being
> silly, I realise you can’t on your non-SQL platform, so you have to
> think in SQL terms, for an SQL platform). A stored proc, that:
> - navigates the *levels* of the Tree,
> - and loops,
> - executing max 100 INSERTs per BEGIN::COMMIT for the new Key,

Let me call this T1...

> - then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.

...and this T2.

Why is the state after T1 has committed and before T2 starts considered
valid? Wouldn't a query performed between T1 and T2 get an inconsistent
view of the data?

Another question: if T1 requires 200 INSERTs instead of 100, you would
split it in two. Again, how can you consider the intermediate state
(after the first 100 INSERTs, but before the remaining ones) valid?

Nicola

Derek Ignatius Asirvadem

unread,
Jun 14, 2021, 6:08:25 PM6/14/21
to
Nicola

> On Tuesday, 15 June 2021 at 03:39:09 UTC+10, Nicola wrote:
> > On 2021-06-14, Derek Ignatius Asirvadem wrote:
> > So think that out. Write an ACID Transaction (no I am not being
> > silly, I realise you can’t on your non-SQL platform, so you have to
> > think in SQL terms, for an SQL platform). A stored proc, that:
> > - navigates the *levels* of the Tree,
> > - and loops,
> > - executing max 100 INSERTs per BEGIN::COMMIT for the new Key,
> Let me call this T1...
> > - then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.
> ...and this T2.

We can call it T1; T2, but we don’t want to confuse that with a normal ACID Transaction T1; T2, because it is not, it is a Batch Transaction. Whereas in ACID, we are dealing with {Add|Drop|Mod} a logical Atom of data (RM-compliant db = coincident with Logical Atom; non-compliant db = does not have Logical Atom), here we are moving an entire tree, of many levels, therefore State in the ACID sense will not apply.

We may be better off calling it B1; B2.

It is [Relational] Set Processing (much faster), not row processing, with the Set Size limited to chunks of 100. You can’t do this with say a CURSOR.

The State that is relevant is the State of the whole tree, either the whole tree is in the OldKey xor the whole tree is in the NewKey.

Lock the OldKey tree for the duration. Based on your platform, that will require different verbs (if it has a verb for locking). Obviously, only the single row at the top needs to be locked. Here you are performing the ISOLATION manually. Unlocking OldKey is not necessary, because at the end all OldKey rows are deleted.

If you have implemented *Optimistic Locking* (a well-known component of the OLTP Standard, which affords high concurrency), it makes life much easier. Ie. I don’t have to use a LOCK (whatever variant) command, I am locking Logically (data level), not physically (server level).

Because it is a first cut, you don’t have to implement this capability, it is an FYI for understanding. We write such batch transactions as recoverable. That means we can track precisely where the *move* failed, or stated otherwise, how much the NewKey tree succeeded. You need to be able to code SQL IN and NOT IN on large sets, with their composite Keys, with confidence re speed (you can’t in Oracle), to be able to do this.

One decision you have to make is, how much of the NewKey tree is visible, that is NOT-ISOLATED. Our guideline (within the standard) is:
- for financial or audit-required data, lock the single row at the top of the NewKey tree,
- for the rest, allow it (partial tree, in progress, level-at-a-time) to be visible

> Why is the state after T1 has committed and before T2 starts considered
> valid? Wouldn't a query performed between T1 and T2 get an inconsistent
> view of the data?

The [available] data is never inconsistent. An user gets whatever he queries, from either a whole OldKey tree xor a whole NewKey tree (partial levels for non-audit data ... which is changing by the millisecond, as accepted by the users/auditors).

> Another question: if T1 requires 200 INSERTs instead of 100, you would
> split it in two. Again, how can you consider the intermediate state
> (after the first 100 INSERTs, but before the remaining ones) valid?

I think that is mostly answered above.

The “split” is not an act per se. In Sybase or MS/SQL, the command to limit the rows-affected-by-verb is:
__ SET ROWCOUNT n
which operates until reset:
__ SET ROWCOUNT 0
Just exec that before your first WHILE loop, and clear it after.

=======================
== Standard: Two Options ==
=======================

Upon reading my OP, I noticed a bit of non-clarity, which I will clear here.

Without the Standard, let’s say for a “good” database (ignoring other errors), you would CASCADE updates to a Key (mickey mouse non-server), or UPDATE Key, and suffer horrendous contention and blow the Transaction Log.

--------------------------------------------------------------
-- OLTP Standard • Batch Transaction --
--------------------------------------------------------------

With the Standard, which is what we deliver as minimum, all such contention is eliminated. Described above. Very Fast, meaning visibility of the partially-built NewKey tree is not really an issue. But it requires thoughtful coding.

Once implemented, it can be enhanced even further (not described here).

----------------------------------
-- Transaction Loop --
----------------------------------

But many times, we do not have a /Replace the DB/RFS with a real RBD/ directive, we are only there to perform a small or large consulting assignment. In any case, that involves education. The result of such education is, the developers with the new awareness of what happens under the covers, want to elevate their mess to less-mess. That is, they want to do anything that will reduce the contention problems they are suffering: they want to stop using UPDATE Key.

Where they have honest ACID Transactions for their normal operations (eg. OrderSaleItem_Add_tr), that is relatively easy. (Where they don’t, the mess is a Date/Darwen/Stonebraker pig sty, and has to be replaced, not fixed-in-place.)

Write a much simpler set of loops, and exec the existing Transaction sprocs at each level. This means the rows are moved from OldKey to NewKey one-Transaction-at-a-time. Eliminates one set of contention problems immediately. It is slow, but only when compared to genuine Batch Transactions (which they don’t have). It is not a half-way point in reaching compliance with the Standard, but it is easy to do, and relief is immediate.

That is what I meant in [1.b] in the OP, and in the linked post (which has a different context):

> > On 2021-06-14, Derek Ignatius Asirvadem wrote:
> >
> > 1.b So the example "business transaction" would be converted into a batch job that runs in a loop and executes six million OLTP Standard-compliant single-row transactions.

Let’s put it this way. Some developers are responsible, proud of their work. When they obtain education, they have a natural urge to fix-up their code to eliminate the causes of their contention problems (effects) that they are now-aware of. They want to be able to say /my code complies with Derek’s Standard, it is non-contentious/ or /my code isn’t the Standard, but it is the least contentious that we can manage right now/.

One cannot move from this category to the OLTP Standard-compliant category, without at least implementing Optimistic Locking; etc. If one does implement OLTP Standard, it is best to go the whole hog: rewrite the database for full /RM/ and ACID and Optimistic Locking; write a full set of ACID & Batch Transactions; etc. No half-measures.

Cheers
Derek

Derek Ignatius Asirvadem

unread,
Jun 14, 2021, 6:50:37 PM6/14/21
to
Nicola

> On Tuesday, 15 June 2021 at 08:08:25 UTC+10, Derek Ignatius Asirvadem wrote:
>
> > On Tuesday, 15 June 2021 at 03:39:09 UTC+10, Nicola wrote:
> >
> > > On 2021-06-14, Derek Ignatius Asirvadem wrote:
> > >
> > > So think that out. Write an ACID Transaction (no I am not being
> > > silly, I realise you can’t on your non-SQL platform, so you have to
> > > think in SQL terms, for an SQL platform).

A Caveat that is really important. In my years of dealing with developers who are used to the Stonebraker/MVCC mindset, it is clear that they do not grasp ACID. They are so used to the Oracle/PusGres *redefinition* of ACID (fraud), to fit their MVCC mindset, for the purpose of making their MVCC somewhat palatable, that they cannot grasp that they do not know ACID. That state of ignorance allows them to argue that MVCC is comparable to “ACID”. MVCC is not comparable to ACID, it is the polar opposite.

Flat and absolute declaration: MVCC systems cannot provide ACID, in particular the ISOLATION LEVELs. The consequence is, MVCC systems cannot provide any level of genuine OLTP.

So, even though the current line of questioning is about Batch Transactions, it may be hindered by an incorrect understanding of ACID.

> > > Write an ACID Transaction

Sorry, no, it is a Batch Transaction with no regard to ACID.

> > > A stored proc, that:
> > > - navigates the *levels* of the Tree,
> > > - and loops,
> > > - executing max 100 INSERTs per BEGIN::COMMIT for the new Key,
> >
> > Let me call this T1...
> >
> > > - then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.
> >
> > ...and this T2.
>
> We can call it T1; T2, but we don’t want to confuse that with a normal ACID Transaction T1; T2, because it is not, it is a Batch Transaction. Whereas in ACID, we are dealing with {Add|Drop|Mod} a logical Atom of data (RM-compliant db = coincident with Logical Atom; non-compliant db = does not have Logical Atom), here we are moving an entire tree, of many levels, therefore State in the ACID sense will not apply.
>
> We may be better off calling it B1; B2.

Whereas T1; T2 are ACID Transactions, and State means database state; Consistency per all Constraints, ie. Logically Atomic, B1: B2 are physical chunks (delimited by an arbitrary number), in a physical *move* operation. It is harmless ACID-wise because the OldKey tree is ACID-compliant, and we are not affecting that in any way, we are faithfully carrying that over into the NewKey tree.

Such operations are not foreign to a DBA or developer, because they are used to *moving* large chunks of data in their normal day-to-day work. Eg. moving a logical xor physical subset of a table from Production to Development for test purposes. For logical (Atomic) purposes, that means not one table but a set of tables (a branch of a tree). Such operations may be new to some readers.

Cheers
Derek

Derek Ignatius Asirvadem

unread,
Jun 15, 2021, 7:49:38 PM6/15/21
to
Nicola

> On Monday, 14 June 2021 at 14:35:13 UTC+10, Derek Ignatius Asirvadem wrote:
>
> > On Monday, 14 June 2021 at 01:13:54 UTC+10, Nicola wrote:
> >
> > I'd like to do some benchmarks.
>
> Great idea. Excellent ...
>
> *Benchmark* strongly implies an ability to monitor the server, all metrics that pertain to performance (in general), and the particular metrics that pertain to the particular benchmark. Freeware has no such thing, so you will be working in the dark, with little idea re what is happening under the covers.

> For contrast, take a glimpse of what is available in commercial SQLs.
>
> 0. Monitoring products
> Additional licence fee, not shown. Feel free to search the internet.

There are quite a few.

0.a
Bradmark is probably the best of breed. Great graphical representation with drill-down, etc. Note that we have had these for thirty years, ten years before the open source anti-market produced you-get-what-you-pay-for “database systems”.

__ https://www.bradmark.com/products/survSybaseASE.html

AFAIK, none of the players provide support for freeware. This is not to say that there is not a market (there is, there are suckers born every minute). The reason is, there is nothing, no performance stats produced, to monitor. Worse, whatever does exist, keeps changing with every major version. Just think about how “transaction” and “transactional” issues have changed in the last ten years, and we are still nowhere near ACID.

0.b
Sybase used to have a full-blown Monitor Server, that sat on the same box as the DB server. Heavy duty like you would not believe. Perfect for benchmarks. Slowly made obsolete as 3P products gained market share.

> 1. Raw Stats: text, various forms. Free.
> Voluminous, not shown. Can be ascertained from the following [which are summaries].

Generally two types:
1.a
Internal counters, reported as requested (eg. 24 x 60 mins), zero overhead. This has been available from the beginning, and this is what I use to produce [2].

1.b
Monitoring & Diagnostic Access. A database in the server, that collects [1.a] and exposes them as Relational tables. Overhead is 5-15% depending on what is collected (configurable). Eg. collection of execuing SQL can be heavy.

> 2. Raw Stats Formatted, especially for various types of comparison.
>
> __ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf

I have given mine for comparison. Do you have a link or reference for monitoring PusGres ?

> “Dlt” is the delta, between that column-of-stats and the first column-of-stats, as a percentage, with {-|+|x}.

For a quick overview, just scan that column: existence of a double-digit value means the metric is worth examination; positive/negative is good/bad thing, depending on the metric of course.

“Selection”
At the top of the page. These are selected “Key Performance Indicators” or an executive summary.

Note that the load is identical (two Mondays chosen to ensure that). But the activity within the server is quite different. The first column shows the activity due to the SAN fault, the second shows the activity after the correction:
__ Read Locks reduced by 91%
__ Server/Context Switch reduced by 21%
__ Unix/Context Switch reduced by 24%
__ and of course far less CPU usage, at both levels

> For freeware & Oracle, due to not having a Server Architecture, and instead deploying hundreds or thousands of programs running on Unix, the first avenue of “performance monitoring” and even “problem diagnosis”, is via Unix monitoring:
> -- top
> -- vmstat

“Host System”
The unix/vmstat metrics are at the bottom of the page.

Sybase is a dedicated server, meaning that it is designed for tight integration (even binding) with the o/s and hardware, and nothing else should be run on the box. This box additionally hosts runs a small MySQL and a few other small things, and the degree to which they allow me (Level 3 support) to tightly integrate with the o/s is limited. Point being, performance is reasonable, but nowhere near best possible for the box. Eg. I can improve throughput significantly, and of course I would cancel the Monitor db.

“Selection”
At the top of the page. These are selected “Key Performance Indicators” or an executive summary.

> “Engine” is a single Unix Process. The Sybase ASE server comprises a grand total of FOUR Engines in this instance, and serves hundreds of active connections (doctors; nurses; medical images; client histories; etc). In contrast, freeware and Oracle would have hundreds of Unix Processes and no concept of Engine.

For those who labour over performance tuning of Oracle or PusGres, because it lacks a genuine Architecture diagram, here it is. Please don’t say that I only help the top end, that I don’t help the bottom-feeders. They are identical, just substitute the Oracle component names with the PusGres equivalents:

__ https://www.softwaregems.com.au/Documents/Article/Oracle%20Circus/Oracle%20vs%20Sybase.pdf

Cheers
Derek

Nicola

unread,
Jun 16, 2021, 4:57:23 AM6/16/21
to
On 2021-06-14, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> Nicola
>
>> On Tuesday, 15 June 2021 at 03:39:09 UTC+10, Nicola wrote:
> The State that is relevant is the State of the whole tree, either the
> whole tree is in the OldKey xor the whole tree is in the NewKey.

Would you mind letting me understand with an example (I'd like to grasp
the overall idea, which I have not yet; I am not asking you to reveal
the secret recipes of your shop) from this document:

https://www.softwaregems.com.au/Documents/Student_Resolutions_2020/Giannis/RoomMonitor%20DM%20V0_51.pdf

Suppose that, for some reason, one needs to update one HotelChain,
changing the value of HotelChain from ABC to H-ABC. If I understand
correctly, you would start by (optimistically) locking the record with
HotelChain = ABC.

What would you do next? Insert the new record with key H-ABC into
HotelChain, then insert zero or more records referencing H-ABC into
Hotel, then into HotelNetwork, ... (up to 100 inserts), commit; more
inserts (up to 100), commit, etc.?

How would you delete the old records then? With transactions of up to
100 DELETEs each, starting from the bottom of the hierarchy and
navigating the hierarchy up?

> If you have implemented *Optimistic Locking*

It's interesting that you mention optimistic locking (have you ever
mentioned in before in this group?), because my understanding was that
you believe that the only correct way to control concurrency was strict
pessimistic locking (strict 2PL). This may have been a misunderstanding
on my part.

Nicola

Derek Ignatius Asirvadem

unread,
Jun 16, 2021, 9:30:55 AM6/16/21
to
Nicola

> On Wednesday, 16 June 2021 at 18:57:23 UTC+10, Nicola wrote:
> > On 2021-06-14, Derek Ignatius Asirvadem wrote:
> >
> >> On Tuesday, 15 June 2021 at 03:39:09 UTC+10, Nicola wrote:
> > The State that is relevant is the State of the whole tree, either the
> > whole tree is in the OldKey xor the whole tree is in the NewKey.
>
> Would you mind letting me understand with an example (I'd like to grasp
> the overall idea, which I have not yet; I am not asking you to reveal
> the secret recipes of your shop) from this document:
>
> https://www.softwaregems.com.au/Documents/Student_Resolutions_2020/Giannis/RoomMonitor%20DM%20V0_51.pdf

They are not secrets in the sense that they are natural progressions of the /RM/, due to a deeper understanding of it. I am sure other serious boutique consulting houses have done at least some of those things. Perhaps not with proper labels and standardisation.

But sure, I answer questions. As deeply as the person is capable of receiving. Which on this forum is only you.

However, this subject is something that absolutely every single programmer should know, not just those developing code to run against an OLTP RDB or “OLTP” RFS. It is a programming construct, a Batch Job. Running against an OLTP database. The DayEnd or MonthEnd procedure at a bank. Move OldKey to NewKey. Some minor deployment requirements, but the construct is the same. I am happy to give full details in public. Particularly because I am shocked that SQL coders do not know this.

The OLTP database and particularly its API (set of ACID Transactions) are already designed for low contention; high concurrency. Think: cars running along the streets of a city. A transaction that affects 10K rows is like a train running along surface streets (no rail tracks, we have one TransactionLog file that cars and trains; boats and planes, all fight over), across the city. That is what contention is. It requires perceiving the whole, as well as the parts. (You guys are trained to think about parts only, and one-at-a-time.)

Cars are allowed max 1 trailer; trucks max 16 trailers; trains max 100 trailers.

> Suppose that, for some reason, one needs to update one HotelChain,
> changing the value of HotelChain from ABC to H-ABC. If I understand
> correctly, you would start by (optimistically) locking the record with
> HotelChain = ABC.

Yes.
And leave it locked for the duration, it will be the last deleted.

>>>>
> (optimistically)

Because I know that you guys do not understand Locking, let alone Optimistic Locking, note this this is not an act within that [generic, theoretical] framework. This is a data-level lock (if you have row locking, and a row locking command, you do not have ACID *and* you have broken a cardinal rule of OLTP).

If you have Soft Deletes, use that.
<<<<

> What would you do next? Insert the new record with key H-ABC into
> HotelChain, then insert zero or more records referencing H-ABC into
> Hotel, then into HotelNetwork, ... (up to 100 inserts), commit; more
> inserts (up to 100), commit, etc.?

Yes.
Code WHILE loops. One WHILE loop per level. SQL has INSERT-SELECT, it is Set Processing, very fast.

Limiting the affected set is simple, in MS or Sybase syntax:
__ SET ROWCOUNT { 0 | n }

Some people would code a cursor. It is pure laziness, same as CTEs. Massive overhead, and totally unnecessary. I don’t allow a cursor on my servers.

> How would you delete the old records then? With transactions of up to
> 100 DELETEs each, starting from the bottom of the hierarchy and
> navigating the hierarchy up?

Yes.
OldKey rows.

> > If you have implemented *Optimistic Locking*
>
> It's interesting that you mention optimistic locking (have you ever
> mentioned in before in this group?),

I don’t keep track of such details, so I can’t say.

Optimistic Locking is a science that existed from the early 1960’s, it is an essential part of OLTP. I have just Relationalised and SQLised it, and made it the OLTP Standard. Which in turn is an essential part of Open Architecture. There are components that have to be in the tables, and of course code components in every ACID Transaction. Science is truth, and truth does not change, it has not changed since 1973 [when I came to know of it]. I openly credit IBM Mainframe CICS/TCP (their transaction server, that you guys think is “batch”). That is Transaction Control Process.

Academia do not know that. So the imbecile Stonebraker read the IBM manuals and thought, with his great charisma, he could do better. So I know, from the bits I have read in this forum, that you guys have a fantasy version of “Optimistic Locking”.

> because my understanding was that
> you believe that the only correct way to control concurrency was strict
> pessimistic locking (strict 2PL). This may have been a misunderstanding
> on my part.

Well, I don’t have a problem with what you have picked up. The thing that needs to be understood is, you have picked up whatever I stated, but from your pre-existing mindset. First, noting the above ridiculous notion of “Optimistic Locking”, that consists of a few notions (not standards) from Optimistic Locking.

Second there is no such thing in reality as “pessimistic locking”. Or that “2PL” is “pessimistic locking”, it is absurd. I can only think that the idiots who said so, in those grand papers posed such a silly thing as a Straw Man argument, to make “2PL” systems look bad.

No. Optimistic locking has nothing to do with “2PL” locking. The former is a construct deployed in tables and ACID Transaction code. The latter is the resource contention resolution mechanism on the server, quite removed from the former.

MVCC is not optimistic. It may well be “optimistic” to those precious darlings who chant the mantra /Readers don't block writers, writers don't block readers/, which exists only in the imagination, not in reality, and certainly not in their precious MVCC database that keeps hanging up despite all the users, together now, chanting the mantra /Readers don't block writers, writers don't block readers/.

So no, coming from that mindset, due to the quite different notions attributed to terms (see how the pig poop eating freaks destroy science), reading my posts, which may not deal with the issue directly, you will have some incorrect understanding.

Yes, I confirm, the only valid contention resource resolution mechanism in an OLTP database, is Ordinary Locking (science unchanged since the 1960’s, steadily advanced in the chip design sector). Which is not pessimistic, no matter how much you call it that. Now you guys call that “2PL” or strict “2PL”. (I don’t know how strict has anything to do with it. Concerning locks and security, it is binary: either you have it or you don’t.) So for me, 1PL or “2PL” is strict, we can’t strictify it any further.

I may have been lazy here in the past, by not rejecting the term “2PL”. From the little I do know, it is yet another mad construct on the academic-only collective imagination. It is not a good idea to place that label onto me, because the definition is too loose and unscientific, and I do somewhat more than that. Properly explained, I would not use stupid terms such as “expanding” and “shrinking”, no, they are two code blocks. In this farcical “definition”:
__ https://en.wikipedia.org/wiki/Two-phase_locking
half those categories do not exist, even describing them is hysterical.

Eg. in “C2PL”, the imbecile acquires locks “before” the transaction, without realising that the only way to acquire locks is “INSIDE* a transaction. Cesspool.

Eg. 1PL [to the OLTP Standard] eliminates deadlocks, that is one declared guarantee. These freaks call it “2PL” and state that deadlocks can happen, and elsewhere that it causes deadlocks.

Eg. I will bet you, you guys can’t even define a deadlock, let alone guarantee preventing it.

Ok, according to the latrine, Sybase; DB2; and MS are “SS2PL”, not “S2PL”.

And of course 1PL is much more, more than the total collective imagination of “SS2PL” is among the academics.

On that wiki page, the particular odour of fæces is, that of a female pig, a sow with full teats. All 12 piglets wrote that one page together, right between those warm teats. It is just a fanciful write-up of various narrow views of Ordinary Locking, which are self-serving. “Scientifically” defined Straw Men, the purpose of which is to *AVOID* understanding Ordinary Locking.

Actually, it is an attack on ACID, because any description of Transaction that are not ACID, is anti-ACID. They don’t even mention it, because the entire page; all descriptions, fail the [A]tomic requirement. And thus they can write 100 papers about how MVCC is sooooo much better than those Straw Men. And continue denying reality. Pffft.

The only valid papers on Locking were written by IBM and Britton-Lee/Sybase engineers. Stonebraker is a fantasist, a cult leader, not a scientific person by any measure. Kool Aid and mantras. Maybe some chakra balancing. Believe me, his downward dog never got up.

Yes, so all the high-end servers are Ordinary Locking or “1PL” or “SS2PL”, whether it contains a database or not. Separately, on the database and Transaction code side, we implement Optimistic Locking, for OLTP, not because the server is OL/1PL/SS2PL.

Even the label SS2PL is stupid. The piglet states that it is one phase, but hysterically goes on with the 2PL label.

For the sane, ACID requirements, as translated into SQL:
__ Locks are automatic, acquired after BEGIN TRAN, until COMMIT TRAN
__ Reads are locked for the duration of the Read operation
__ Writes are locked until COMMIT
__ Nothing can be locked outside a Transaction.

Cheers
Derek

Nicola

unread,
Jun 16, 2021, 3:45:09 PM6/16/21
to
On 2021-06-16, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> Suppose that, for some reason, one needs to update one HotelChain,
>> changing the value of HotelChain from ABC to H-ABC. If I understand
>> correctly, you would start by (optimistically) locking the record with
>> HotelChain = ABC.
>
> Yes.
> And leave it locked for the duration, it will be the last deleted.

>>>>>
>> (optimistically)
>
> Because I know that you guys do not understand Locking, let alone
> Optimistic Locking, note this this is not an act within that [generic,
> theoretical] framework. This is a data-level lock (if you have row
> locking, and a row locking command, you do not have ACID *and* you
> have broken a cardinal rule of OLTP).

Let me see whether we are on the same tune. While you are performing the
batch transaction we are talking about, another transaction may also
attempt to update the record with HotelChain = ABC (and specifically
update the HotelChain field). Is it correct that this second transaction
will be allowed to perform the update, and that first (batch
transaction) will detect at the next commit that the "top" row has been
overwritten, and rollback (at least rollback the running ACID
transaction)?

Because that is my understanding of "optimistic". But the rest of your
post, (and the remark above) make me doubt that we agree on this
meaning.

Except from the point above, "batch delete+insert" is clear enough.

Nicola

Derek Ignatius Asirvadem

unread,
Jun 16, 2021, 6:36:12 PM6/16/21
to
Nicola

> On Thursday, 17 June 2021 at 05:45:09 UTC+10, Nicola wrote:
No, that is not correct.

Yes, you do *not* have a correct understanding of Optimistic Locking, none of the academics do, they have only the Stonebraker MVCC Kool Aid, and they have been drinking it and circle-jerking while chanting the *Mantra* for decades. As I have tried to explain, Optimistic Locking is completely different to the Straw Man that academics erect as “optimistic locking”. This point (your question) has nothing to do with Optimistic Locking.

It has to do with how to construct a Batch Transaction (MonthEnd Procedure at a bank or Move OldKey to NewKey).

>>>>
1. Lock top-most <Table>[ <OldKey> ]
For the duration of the Batch Transaction.
<<<<

>>>>
You need some column that indicates a data-level lock. Standard columns include CreatedDtm; UpdatedDtm; etc, that can be used for this purpose [eg. set CreatedDtm to TODAY+1 or other fixed value [it is temporal, yes] ). You may have IsDeleted to support SoftDeletes. Let’s use the easy one, and say you have Soft Deletes. For clarity, you need a condition [ IsLocked ], and you implement that as indicated in one of the above columns, if not, you have a column [ IsLocked ].

The fact that there is such a column, and that contention is understood and consciously minimised, *is* part of Optimistic Locking, part of the OLTP Standard.
<<<<

But this is not an ACID Transaction, it is a Batch Transaction. Of course, we use the column that is there for OLTP purposes, for the now Batch purpose.

In this case, that means:
-- UPDATE HotelChain.IsDeleted[ ABC ] = 1
---- WHERE HotelChain.IsDeleted[ ABC ] = 0
or
-- UPDATE HotelChain.IsLocked[ ABC ] = 1
---- WHERE HotelChain.IsLocked[ ABC ] = 0

That prevents both [a] and [b] in this:
> another transaction may also attempt to
> (a) update the record with HotelChain = ABC
> (b) (and specifically update the HotelChain field)

(Which row is meant by “record” is not clear, but it doesn’t matter for the answer: all rows from top to bottom that belong to [ ABC ] are locked.)

Separate point. That “another transaction” sounds suspiciously like another instance of the same transaction, using the same key. That doesn’t happen in the real world because the person who administers the first instance of changing ABC is one and the same person who administers the second instance of changing ABC. This sort of insanity is /also/ prevented by proper GUI design (a Tree with no circular references).

It also prevents:
-- UPDATE Hotel.NumStar[]
or
-- UPDATE HotelRoomType.Description[]

--------

To be clear, this is answering your question, which is you trying to understand Batch Transaction structure in an OLTP environment, (a) from the bottom up, and (b) with the known-to-be-straw-man concept of Optimistic Locking. This is *not* answering the question /what is Optimistic Locking/, which would be a top-down lecture, and include the considerations for genuine OLTP.

Cheers
Derek

Message has been deleted

Derek Ignatius Asirvadem

unread,
Jun 17, 2021, 1:19:39 AM6/17/21
to
Nicola

> On Thursday, 17 June 2021 at 08:36:12 UTC+10, Derek Ignatius Asirvadem wrote:

--------------------------
-- 1 Correction --
--------------------------

> In this case, that means:
> -- UPDATE HotelChain.IsDeleted[ ABC ] = 1
> ---- WHERE HotelChain.IsDeleted[ ABC ] = 0
> or
> -- UPDATE HotelChain.IsLocked[ ABC ] = 1
> ---- WHERE HotelChain.IsLocked[ ABC ] = 0
>
> ...
>
> It also prevents:
> -- UPDATE Hotel.NumStar[]
> or
> -- UPDATE HotelRoomType.Description[]

Mistake, sorry, that should be:
It also prevents any Transaction containing:
-- UPDATE Hotel.NumStar[ ABC ]
or
-- UPDATE HotelRoomType.Description[ ABC ]
or
-- INSERT <AnyTableInHierarchy>[ ABC ]
or
-- DELETE <AnyTableInHierarchy>[ ABC ]

----------------------------
-- 2 Impediment --
----------------------------

> Yes, you do *not* have a correct understanding of Optimistic Locking, none of the academics do, they have only the Stonebraker MVCC Kool Aid, and they have been drinking it and circle-jerking while chanting the *Mantra* for decades. As I have tried to explain, Optimistic Locking is completely different to the Straw Man that academics erect as “optimistic locking”. This point (your question) has nothing to do with Optimistic Locking.

The term Optimistic Locking and the method (Standard) that it defines, is IBM [360] CICS/TCP from the early 1960’s. That is science. That is truth that does not change.

Any use of the term to mean anything other than that, is fraudulent; dishonest conduct; anti-science.

Which filth Stonebraker; Ellison; Date; Darwen; Fagin; etc, all engage in, in their suppression of the /RM/ and their promotion of RFS as “Relational”, their suppression of SQL and their promotion of non-sql as “SQL”.

You can’t understand
__ Optimistic Locking
(the OLTP requirement; two non-server deployments) while holding on to any shred of the “optimistic locking” Straw Man that is heavily established in academia over the decades.

You can’t understand
__ server-level Locking
or
__ data-level Locking
while holding on to any shred of the “2PL”; “S2PL”; “C2PL”; SS2PL”; etc, fantasies that is heavily established in academia over the decades. Even the “definitions” are unscientific nonsense, and ever-changing. You would not accept such unscientific nonsense from me, but somehow you accept that filth from Stomebraker and his cult.

You can’t understand
__ server-level Locking
or
__ data-level Locking
while holding on to any shred of the MVCC fantasy that is heavily established in academia over the decades.

MVCC is anti-ACID, it will never work. The evidenced fact that every single Orable and PusGres “database” suffers massive problems due to the MVCC mindset is denied, an act that is enabled by repeating the totally fictitious *mantra* /Readers don't block writers, writers don't block readers/. the declaration is stupid, ridiculous, but hey, everyone is doing it; doing it; doing it.

(If you wish to get into this issue seriously, there is a separate thread open, pleaseengage over there.)

This thread is:
> It has to do with how to construct a Batch Transaction (MonthEnd Procedure at a bank or Move OldKey to NewKey).
But of course, we can’t help touching OLTP structures, because it is an OLTP environment, and that is the over-arching requirement.

-----------------------------
-- 3 Three Shells --
-----------------------------

See if you can step back and understand this.

A. Ten years ago, when you started interacting with me, your notion of a database, as programmed by academia, was:
__ the result of a collection of SQL command, and SQL is broken
__ literally, the physical only
__ ERD

A.2 As a consequence of that interaction, slowly, over time, it has progressed to:
__ the result of constraints, specified in SQL, and SQL may not be broken
__ introduction of some Logic
__ IDEF1X (ERD is anti-Relational)

B. And now recently:
__ the result of Predicates, specified in SQL, and SQL is not broken
__ introduction of essential Logic

Whereas [A] is one shell or core, with horrendous problems, [B] is an outer shell, that secures [A] in terms of data integrity, somewhat.

Likewise, as programmed by academia, you were stuck in
__ Suppress ACID
____ Erect Straw Men to make ACID and 1PL to look and smell bad
__ Elevate the fantasy of MVCC
____ thereby entirely denying the normal considerations for OLTP; ACID

We are just now starting to deal with a third shell [C], that further secures [A] “transactionally”, in terms of data integrity, completely.

This is totally new to academia, to you because you are the first and only academic trying to cross the chasm of ignorance of the industry, that the industry is, and has been since 1984 [as I know it]:

C.
__ OLTP Mindset (since 1960’s)
____ ACID Transactions only (since 1960’s, implement in all commercial SQL Platforms)
______ ACID in the server
______ ACID in every Transaction
____ Optimistic Locking
______ Optimistic Locking in each table
______ Optimistic Locking in every Transaction

You need to appreciate that as long as you harbour [2 Impediment] the Straw Men for “ACID” instead of ACID, there is no way that you can understand the [3 Three Shells] of protection, the third shell, the ACID Transaction context that has protected industry databases since the 1960’s.

Cheers
Derek

Nicola

unread,
Jun 17, 2021, 5:27:16 AM6/17/21
to
On 2021-06-16, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> No, that is not correct.

As expected.

>>>>>
> You need some column that indicates a data-level lock. Standard
> columns include CreatedDtm; UpdatedDtm; etc, that can be used for this
> purpose [eg. set CreatedDtm to TODAY+1 or other fixed value [it is
> temporal, yes] ). You may have IsDeleted to support SoftDeletes.
> Let’s use the easy one, and say you have Soft Deletes. For clarity,
> you need a condition [ IsLocked ], and you implement that as indicated
> in one of the above columns, if not, you have a column [ IsLocked ].
>
> The fact that there is such a column, and that contention is
> understood and consciously minimised, *is* part of Optimistic Locking,
> part of the OLTP Standard.

That starts to sound vaguely familiar.

> But this is not an ACID Transaction, it is a Batch Transaction. Of
> course, we use the column that is there for OLTP purposes, for the now
> Batch purpose.
>
> In this case, that means:
> -- UPDATE HotelChain.IsDeleted[ ABC ] = 1
> ---- WHERE HotelChain.IsDeleted[ ABC ] = 0
> or
> -- UPDATE HotelChain.IsLocked[ ABC ] = 1
> ---- WHERE HotelChain.IsLocked[ ABC ] = 0
>
> That prevents both [a] and [b] in this:
>> another transaction may also attempt to
>> (a) update the record with HotelChain = ABC
>> (b) (and specifically update the HotelChain field)

Is it because each transaction checks UpdatedDtm and finds that it is
equal to TODAY+1, hence it immediately aborts?

>It also prevents any Transaction containing:
>-- UPDATE Hotel.NumStar[ ABC ]
>or
>-- UPDATE HotelRoomType.Description[ ABC ]
>or
>-- INSERT <AnyTableInHierarchy>[ ABC ]
>or
>-- DELETE <AnyTableInHierarchy>[ ABC ]

Such transactions perform the same check, right? And they abort
immediately, I assume, when they find that the top row has been marked
as deleted by the transaction that set UpdatedDtm. Is that correct?

> Separate point. That “another transaction” sounds suspiciously like
> another instance of the same transaction, using the same key. That
> doesn’t happen in the real world because the person who administers
> the first instance of changing ABC is one and the same person who
> administers the second instance of changing ABC. This sort of
> insanity is /also/ prevented by proper GUI design (a Tree with no
> circular references).

It may not happen under normal conditions. But I'd be very worried if
"it cannot happen" relies on correct (human or machine) behaviour.
A system should be resilient to improper and malicious behaviour as
well.

Nicola

Derek Ignatius Asirvadem

unread,
Jun 17, 2021, 6:54:52 AM6/17/21
to
Nicola

> On Thursday, 17 June 2021 at 19:27:16 UTC+10, Nicola wrote:
The /Today+1/ is for temporal tables, wherein a current SELECT does WHERE UpdatedDtm = MAX( UpdatedDtm) less-than-or-equal to today. It is just the standard method or construct to grab only the current row (exclude historic rows). (Whereas a temporal SELECT would grab the particular historic row.) I expect everyone who implements a temporal table [using Codd’s temporal definition, not the massive TwiddleDee & TwidleDumb monstrosity] to know that.

The /Today+1/ just makes the affected row (and the hierarchy below it) invisible to other users, a form of ISOLATION. Heck, we are on a planet named ACID, we are performing an [I]solation. Don’t let that confuse you. If the column is there, use it.

If IsDeleted is there, use it.

Else implement IsLocked, and use it.

Since God carved onto the OLTP stone tablet:
__ THOU SHALT NOT ATTEMPT ANYTHING THAT IS IMPOSSIBLE
yes, every Transaction (both ACID and this one) validates everything it will attempt, before attempting it, before the BEGIN TRAN. In our case, HotelChain is first. Because it is caught in the VALIDATE block, before the BEGIN TRAN, there is no “abort” or ROLLBACK TRAN.

The corollary to that Commandment is of course:
__ Thous Shalt Not Begin Anything That Will Fail, That Thou Canst Not Complete.

I am scared of fire and brimstone. The CICS/TCP guys taught me this when I was still loading COBOL programs into a mainframe, in the form of punch-card decks.

What you guys call “2 phase” is anti-ACID, it starts a transaction and then fiddles and farts while rambling, meaning time spans between requests. As per the wiki cesspool article, in the “SS2PL” that we have, there is only one “phase”. But we don’t start that “phase” until the previous “phase” of VALIDATE completes successfully. And we have no time spans in-between operations.

I expect that you understand the following:
__ in order to comply with [A]tomic, the BEGIN and COMMIT/ROLLBACK must be in one code block in a stored proc, never in the client
__ the caller of the Transaction must execute the exact same VALIDATE block, before calling the Transaction

--------

> >It also prevents any Transaction containing:
> >-- UPDATE Hotel.NumStar[ ABC ]
> >or
> >-- UPDATE HotelRoomType.Description[ ABC ]
> >or
> >-- INSERT <AnyTableInHierarchy>[ ABC ]
> >or
> >-- DELETE <AnyTableInHierarchy>[ ABC ]
>
> Such transactions perform the same check, right? And they abort
> immediately, I assume, when they find that the top row has been marked
> as deleted by the transaction that set UpdatedDtm. Is that correct?

Yes.
Every Transaction that touches the HotelChain hierarchy, any level.
As detailed above, not “abort”, but never start.

Not necessarily “deleted” (which depends on the column that is available to be used for this purpose), but ISOLATED for the duration of the Transaction. When it has completed, the entire NewKey hierarchy is available.

> > Separate point. That “another transaction” sounds suspiciously like
> > another instance of the same transaction, using the same key. That
> > doesn’t happen in the real world because the person who administers
> > the first instance of changing ABC is one and the same person who
> > administers the second instance of changing ABC. This sort of
> > insanity is /also/ prevented by proper GUI design (a Tree with no
> > circular references).
>
> It may not happen under normal conditions. But I'd be very worried if
> "it cannot happen" relies on correct (human or machine) behaviour.
> A system should be resilient to improper and malicious behaviour as
> well.

Absolutely. I did not say “it cannot happen”. It is prevented from happening, by the Transaction standard. That was an explanation, that it is actually prevented at a higher level (in any proper GUI), before the Transaction is called. Therefore “/also/“.

Cheers
Derek

Nicola

unread,
Jun 18, 2021, 10:22:26 AM6/18/21
to
On 2021-06-17, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> The /Today+1/ is for temporal tables, wherein a current SELECT does
> WHERE UpdatedDtm = MAX( UpdatedDtm) less-than-or-equal to today. It
> is just the standard method or construct to grab only the current row
> (exclude historic rows). (Whereas a temporal SELECT would grab the
> particular historic row.) I expect everyone who implements a temporal
> table [using Codd’s temporal definition, not the massive TwiddleDee
> & TwidleDumb monstrosity] to know that.
>
> The /Today+1/ just makes the affected row (and the hierarchy below
> it) invisible to other users, a form of ISOLATION. Heck, we are on
> a planet named ACID, we are performing an [I]solation. Don’t let that
> confuse you. If the column is there, use it.
>
> If IsDeleted is there, use it.
>
> Else implement IsLocked, and use it.
>
> Since God carved onto the OLTP stone tablet:
> __ THOU SHALT NOT ATTEMPT ANYTHING THAT IS IMPOSSIBLE
> yes, every Transaction (both ACID and this one) validates everything
> it will attempt, before attempting it, before the BEGIN TRAN. In our
> case, HotelChain is first. Because it is caught in the VALIDATE
> block, before the BEGIN TRAN, there is no “abort” or ROLLBACK TRAN.

Wait, what is a VALIDATE block? If an atomic processing unit is
delimited by BEGIN TRAN...COMMIT, how can something before BEGIN TRAN be
considered part of that unit? If it is not, then how do you ensure that
nothing happens between the instant VALIDATE completes and the instant
BEGIN TRAN is executed?

And I have a couple more questions, if you don't mind:

1. What you are describing is built on top of the standard locking
mechanism provided by the DBMS. If I understand correctly, it requires
the transactions to be coded to perform explicit checks before accessing
the data. At the outset, that sounds like something that can become very
complicated and error-prone. E.g., it seems that a transaction to insert
a new ReadingMeasure should check many things (HotelChain, CountryCode,
..., Application) before proceeding.

Or, wait, such a transaction would be coded in a "simple" way (insert
into Reading, insert into ReadingMeasure) and then two things may
happen: it is either executed before the batch transaction has "reached"
Reading, in which case it would succeed (and then it would be updated by
the batch transaction); or it is executed after the batch transaction
has updated Reading, hence inserting the new reading would raise
a foreign key violation (and it should be retried with the new key).

Or something else?

2. You are describing an OLTP context, but you have claimed several
times that the same database can serve both OLTP and OLAP workloads. Do
you code analytic queries along the same principles to avoid
locking too many records?

> The corollary to that Commandment is of course:
> __ Thous Shalt Not Begin Anything That Will Fail, That Thou Canst Not
> Complete.

That is worth the "optimistic" label!

> What you guys call “2 phase” is anti-ACID, it starts a transaction and
> then fiddles and farts while rambling, meaning time spans between
> requests. As per the wiki cesspool article, in the “SS2PL” that we
> have, there is only one “phase”.

Ok.

> But we don’t start that “phase” until the previous “phase” of VALIDATE
> completes successfully. And we have no time spans in-between
> operations.

See above. Not clear how you'd achieve that.

Nicola

Derek Ignatius Asirvadem

unread,
Jun 19, 2021, 3:20:51 AM6/19/21
to
Nicola

Please consider in the truth of learning what an OLTP context is, in a Ordinary Locking server (falsely known as “1PL’; “2PL”; “C2PL” ... “SS2PL”; etc). Not in the MVCC mindset, and definitely not what you think ACID or ACID Transaction is. But what it really is, in the science, in the platforms since 1960. SQL Platforms sice 1984. That excludes Stonebraker; Ingres; its zombie son PusGres; and Oracle.

Second, please consider the OLTP context, all ACDI Transactions sans the Batch Transaction, only. After that is clear (no questions) then add the Batch Transaction (high-end OLTP only) to the consideration.

> On Saturday, 19 June 2021 at 00:22:26 UTC+10, Nicola wrote:
All code must be standard, eg. there may be a template. The Transaction stored proc (as distinct from the ACID Transaction in it, that it is named for) contains at least three code blocks, four in high-end OLTP. The EXECUTE Block is Atomic in the ACID sense. The stored proc is Atomic in the sense that it is a single code segment that the caller calls and is executed (partly or wholly).

(In the simple sense, for newbies, ACID Transaction = stored proc. But of course, *all* code is standardised, we have templates, the stored proc is not ONLY the ACID Transaction, it has to have a structure; a Form; error handling; etc.)

______________________________________________
0. THERE IS ONLY ONE VERSION OF ANY FACT

__ Schizophrenics are prevented from writing code of any kind.

__________________
1. VALIDATE Block
__ In CICS/TCP/COBOL terms, this is a PREPARE block.
You know the Fives P’s, absolutely essential for succeeding at anything ? Eg. an Olympic swimmer; a deer hunter; a programmer of ACID Transactions.
__ Proper Preparation Prevents Poor Performance.

>>>>
We know we are locking rows, implicitly, by virtue of the SQL commands we issue, but it is the server that does the locking and the resolution.
• We know that Reads are locked for the duration of the Read operation, and Writes are locked until the COMMIT/ROLLBACK.
__• We can also request a Read to be locked until COMMIT/ROLLBACK, but that requires a Transaction to be opened, and a server that [is genuine OLTP and therefore] allows it.
<<<<

Code:
• Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• RETURN on any failure.
• Check NOT_EXISTS all rows that must not exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• RETURN on any failure.
• Transaction NOT yet started
• All locking is transient and very short duration
• TransactionLog is not touched
• Fall into ...

__________________
2. EXECUTE Block
* BEGIN TRAN
• TransactionLog activity commenced for this xact
• Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• ROLLBACK & RETURN on any failure.
__• the only relevant locking starts here
__• if blocked by another xact, you are blocked here (NOT after the first verb below)
__• the longest wait is for the longest xact in the system that touches the page

So here we code each SELECT as:
__ IF EXISTS (
____ SELECT ...
______ FROM Hotel
______ WHERE ...
______ { FOR UPDATE | HOLDLOCK } -- platform-specific syntax, “beyond” SQL
____ )

__• <-[2.1]

• Check NOT_EXISTS all rows that must not exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• ROLLBACK & RETURN on any failure.
• (INSERT/UPDATE/DELETE)
__• ROLLBACK & RETURN on any failure.

• COMMIT TRAN
• TransactionLog is released.

___________________
3. CLEAN-UP Block
• All error handling
• ROLLBACK TRAN (if structured, the only ROLLBACK)
• <-[3.1]
• RETURN (if structured, the only RETURN)

______
Client
There is ordinary interplay between the Client and the Server. The client may now be a GUI or a webpage or middleware in an app server.

The VALIDATE block [1] is repeated in the client. This is not a duplicate, because it is complying with the Commandment, it would be stupid to attempt something that will fail. We do not need to engage server lock resources and the TransactionLog to find out that something will fail, we can check for it without engaging that. Besides, such an act would get the pages into the cache, if it is not already there, and then, when the EXECUTE block starts, the required pages are in the cache, warm and buttered, waiting to be served up.
_________________
The ACID Transaction is bounded by BEGIN TRAN...COMMIT/ROLLBACK TRAN. It doesn’t even start unless it can complete, which is validated before the start.

> how do you ensure that
> nothing happens between the instant VALIDATE completes and the instant
> BEGIN TRAN is executed?

That duration between instants is:
a. not relevant [assuming you understand the above], and
b. microseconds, because it is between two consecutive steps inside a contiguous stored proc, which is running as object + plan code (compiled and a full resource plan established), and
c. harmless because no resources are held.

__________________
> If an atomic processing unit is
> delimited by BEGIN TRAN...COMMIT,

The /processing unit/ wrt ACID Transaction which indeed must be Atomic, is bounded by BEGIN-COMMIT TRAN, in one contiguous code block.

The /processing unit/ that contains it is not Atomic in the ACID sense, but it is Atomic by design, a contiguous code block; and optimised (compiled + resource plan). Eg. I do not allow that to be non-contiguous. The two need not be the same /processing unit/.

This is also why the BEGIN TRAN must never be in the client.
__________________
Batch Transaction

Add:

2.1 Add the data-level lock
___ BEGIN TRAN
___ UPDATE HotelChainIsDeleted[ @HotelChain ] = 1
___ COMMIT TRAN

3.1 Remove the data-level lock (in our example this ois not required, because it is the last row deleted).
___ BEGIN TRAN
___ UPDATE HotelChainIsDeleted[ @HotelChain ] = 0
___ COMMIT TRAN

You may consider the BEGIN-COMMIT TRAN superfluous. No, They are demanded by the standard. Further, they are anchors for code investigation, and on the server vis-a-vis monitoring metrics.
_________________

> And I have a couple more questions, if you don't mind:

I would be pleased to answer.

> 1. What you are describing is built on top of the standard locking
> mechanism provided by the DBMS.

If you mean the normal ACID Transaction in the full OLTP Context, no. It is aware of, and implicitly uses the locking in the server (whatever that is).

If you mean the Batch Transaction, yes. I said so at the start. That is why I call it a data-level lock, it has to be done in data, we have no direct access to the locks, and we never do idiotic things such as:
__ LOCK TABLE

> If I understand correctly, it requires
> the transactions to be coded to perform explicit checks before accessing
> the data.

Yes.
All ACID Transactions.
It is the same “explicit” checks as the “explicit” operations it will attempt, no less, no more.
We have been doing that since 1960, on all OLTP systems.
Welcome to the industry in the real world, that you guys have never served.
Thank you for visiting.
Please come again.

> At the outset, that sounds like something that can become very
> complicated and error-prone.

How ? Why ?
Sure, we use templates, and the developer has to be educated in more than the schizophrenic MVCC mindset, but I can’t see how it is “complicated” (see detail above, or “error-prone”. In any case, the errors will be caught in development DEV, long before TEST and UAT.

It is 2021, yes ? No one codes SQL any more. We have been using IDEs since the early 1990’s. Now there are free IDEs. Of course PowerBuilder is still the best and most integrated, and the only one that has DataWindows, which is why it is in a separate class of its own. There are many imbeciles who think and work like SQL is a “high-level language”. It isn’t. It is a low-level data sub-language, it is cumbersome (must needs be, unless you have single characters for your file and field names). You can’t go lower than SQL.

>>>>
Except in high-end servers, where you can write query plans or modify the QPs that the compiler produced. The need for this is essentially to work-around problems caused by bad file design. Since I have only Relational tables, I have never had to use this facility, but being a performance expert, I have corrected at least 100.
<<<<

>>>>
Now for SG customers, there is much more. I give the OLTP Standard; a set of Templates; and a check list which must be filled by a peer, not the coder, for each xact_sp, before submission to the DBA (to place it in TEST Rdb). The DBA is the policeman, and has to check the last check box, a second layer of checking plus resolution of names; etc, before signing.
<<<<

> E.g., it seems that a transaction to insert
> a new ReadingMeasure should check many things (HotelChain, CountryCode,
> ..., Application) before proceeding.

Yes.
In order to comply with the Commandment.
Every single time.
BEFORE starting the xact.

Would you NOT want to check the depth of the pond before you dive into it ?
Would you NOT check if you should use the railway station or the airport before travelling to Timbuktu ?
Would you check that <PlaceName> exists before selling someone a ticket to <PlaceName>.
What is the big deal ?

Welcome to the exciting REAL world of the industry, it has nothing in common with the “industry” as defined by theoreticians.

>>>>
Now for SG customers, there is more. We generate the Transaction sprocs from the SQL Catalogue tables ... with a number of enhancements that I can’t divulge. (Just so that you know such is possible ... it is the next step for IDEs.) So the developers only have to check-and-change, rather than write sprocs.
<<<<

> Or, wait, such a transaction would be coded in a "simple" way (insert
> into Reading, insert into ReadingMeasure)

(I assume that is bracketed by BEGIN/COMMT TRAN. And that is not the Batch Transaction discussed.)

Yes, that is properly called a Naïve Transaction. Of course it does not mean basic OLTP requirements, and it does not observe ACID issues, so it is sub-standard. But as a naïve or simple transaction, that is fine.

The ACID/OLTP Standard (Not the SG OLTP Standard, which does more) for that Transaction, is given above.

> and then two things may
> happen: it is either executed before the batch transaction has "reached"
> Reading, in which case it would succeed (and then it would be updated by
> the batch transaction);

Correct.
That will happen for either the Naïve Transaction or the OLTP Standrad-compliant Transaction.
The Naïve Transaction may additionally fail due to Reading[ PK ] NOT_EXISTing, which causes (a) locks held, and (b) an anchor on the TransactionLog, which is then released, all of which is easy to avoid.

> or it is executed after the batch transaction
> has updated Reading, hence inserting the new reading would raise
> a foreign key violation (and it should be retried with the new key).

Correct.
That will happen for either the Naïve Transaction or the OLTP Standrad-compliant Transaction.

> Or something else?

No.

__________

> 2. You are describing an OLTP context, but you have claimed several
> times that the same database can serve both OLTP and OLAP workloads.

Yes, absolutely. Since 1993.

(There are some environments in which the customer *chooses* to host a replicate database, to offload reporters from the OLTP, which they do after I deliver the project. Note, this is for load-spread reasons, not for contention-reduction reasons, whereas for all other suppliers, it is for contention-reduction reasons. I still guarantee *uneventful* OLTP & OLAP on the same database.)

> Do
> you code analytic queries along the same principles to avoid
> locking too many records?

> code analytic queries

We don’t code analytic queries.

See if you can understand this (not being condescending, but we both know you are crossing a chasm).

__ FOPC[ Predicate ]-->RM[ Predicate ]-->SQL[ Predicate ]
____ Predicate{ Unary | Binary }
____ Unary = full existential reality
____ Binary = relations between Facts (not mathematical relations)
__ SQL[ Predicate ]--> ReportToolSimple
__ *NO* “Ontology”, because the reality of existence of every object is defined in the catalogue
____ sp_help <Object> reports it, a ReportTool picks it up
__ *NO* “diskwipshun logicks”, because the entire database is described in terms of Predicates
____ which are clearly visible in the Logic Map (IDEF1X Data Model).

>>>>
I have a simpler form of the Logic Map in IDEF1R, which users prefer over the IDEF1X/Attribute level. That form is the proper replacement for self-crippling *ERD”, that is programmed into young minds at all “universities” as “relational”. Except for one professor at Udine, who has switched to IDEF1X.
<<<<

Other than a great DM in IDEF1X and a Data Dictionary (easy to produce from ERwin), I give them nothing. I help the actual users to choose a Report Tool that is appropriate. Any simple Report tool such as CrystalReports will suffice. I give them access via Excel/Numbers. BusinessObjects and such (six figures, heavy duty report tools that implement an “universe”, which is definitely needed for non-databases such as RFS) are totally unnecessary. They hate me, same as the academics, and for the same reasons, here they provide a method of overcoming the filth of RFS, which I never need.

Sure, there are some clever things I do, such as eliminate the need for “pivot” tables, but I don’t provide code (I can!), I just show them how to produce a pivot report in SQL and CrystalReports.

Likewise, RFS knuckle-draggers DO need to *code* queries (think: every link in a chain has to be navigated, and navigated correctly, due to NOT having RM/Access Path Independence), and fiddle around for a week or three to get it to work. In contrast, I guarantee that any report requirement can be fulfilled with a single SELECT command.

I don’t even use Temporary tables (except for reformatting imported files). Never have.

Likewise, I give all forms of hierarchies in the data (refer the Bill of Materials doc). Developers who follow the academics, or who think Celko is a “guru”, pour their hierarchies in concrete, such that any insert has to rebuild the entire hierarchy (refer Adjacency List or Nested Set). (I shoot such people.) So I have to teach them about what a hierarchy really is; how it is beautifully implemented in the /RM?, and this Rdb. It needs recursion, so the orangutangs that view the entire universe through the myopic OO lens; CRUD; poisissytence; etc, need education to introduce them to the Wonderful World of SQL Power. But not code.

> along the same principles to avoid
> locking too many records?

The whole point of the Batch Transaction is that it limits the number of ISOLATED rows, which means limited locks (as distinct from rows or records, because we lock pages not rows/records). The server has Row Level Locking, I have never had the need. I have converted (the typical Version 2 project) many RFS type systems, that had to have Row Level Locking to even run at all, to ordinary Relational and Page locking. Thus the locks are far fewer than the rows updated.

So no, the Batch Transaction does NOT hold many locks. And due to the tight structure, whatever locks that are held, for 100 max rows, are held for the shortest duration.

> OLAP Reports
Why do you worry about locks in OLAP reports ?
Who cares.
Read locks are extremely short, and transient (held for the duration of the read operation, not the Transaction). We can’t control Read locks, even implicitly
__ (sort-of exception ISOLATION LEVEL 0 “Dirty Reads”)
I certainly do not wish to take even a fragment of the Stonebraker Disneyesque mantra for slobbering idiots, but since you are familiar with it ...
__ Readers never block writers
__ Readers are blocked by uncommitted changes TO THE PAGE only
the longest duration of which is that of the longest OLTP ACID Transaction that touches that page, which is the shortest possible, by design.

> > The corollary to that Commandment is of course:
> > __ Thous Shalt Not Begin Anything That Will Fail, That Thou Canst Not
> > Complete.
>
> That is worth the "optimistic" label!

Certainly, if one is attached to Truth, one has Reason for Faith; Hope; and Charity. Genuine optimism (as distinct from the new age poppycock of “positive thinking”) is founded on that Hope.

But still, that is not Optimistic Locking. And in this post, there is nothing about Optimistic Locking.

> > But we don’t start that “phase” until the previous “phase” of VALIDATE
> > completes successfully. And we have no time spans in-between
> > operations.
>
> See above. Not clear how you'd achieve that.

Done.
________

If you are interested in how Ordinary Locking works, here an overview of Locks and the Lock Manager in an OLTP Server. Again, please, before reading, erase all the false notions (“1PL”; “2PL”; “S2PL”; “C2PL”; “SS2PL”; and the like) from your mind. Notice very gravely that you guys have no idea about locking or a lock manager, you have only the problem-fix layer on top of the MVCC monstrosity (because some things in that fantasy can only be resolved with a Lock Manager).

This one is core-unchanged since 1984, enhancements added over time to cater for new data structures, only. Private papers and patents in the early days, but now we can give it in documents such as this to the SAP/Sybase community. Just don’t tell the PissGris droolers about it. Pages 1 & 2 only:

____ https://www.softwaregems.com.au/Documents/Article/Sybase%20Lock%20Manager/Sybase%20Lock%20Manager.pdf

__ To understand this properly, you need to understand ACID properly, in both the server context and the Transaction context. Developers who code for such servers have to know ACID, thus ACID is not explained.
____ Again, MVCC systems do not, cannot, provide ACID, and developers on such systems have no clue about ACID; how simple and powerful it is; etc. So look carefully for terms such as ISOLATION level, and genuine compliance to that.
__ All locks are in-memory
____ All pages that are touched are in-memory (the data caches)
__ Latches are fast locks, for short-duration operations (eg. not disk)
__ Spinlocks are for server-internal structures only, wherein the Engine “spins” while waiting for a fast-lock (the true database server does not yield the CPU)
__ Cursor locks (an addition type) are not shown.
__ DRL and DOL are a new [physical] table type to provide even faster file services (RecordID). Which again I have no use for, but I have decades of experience fixing. Comments about such can be ignored.
__ Don’t skip the link to the Server Monitoring Report, at least § Lock Manager.

Come to think of it. Since you are concerned about Read locks and their effects; etc (as opposed to Write locks, which is what is of concern in the above discussion). This happens to be a perfect, real world, example, which can be used for further discussion. Because it has Before::After stats and comparisons, at the metric level, after fixing a SAN configuration error, which resulted in slower I/Os all around. I will let you examine the report and ask questions, eg. /How come the Read Locks dropped by 91% (or how come the slow I/O resulted in 1,100% increase in Read Locks) ?/

I have described the columns in some post in this or the other thread.

Cheers
Derek

Derek Ignatius Asirvadem

unread,
Jun 19, 2021, 3:52:18 AM6/19/21
to
Nicola

> On Saturday, 19 June 2021 at 17:20:51 UTC+10, Derek Ignatius Asirvadem wrote:

> Read locks

The previous post is a bit long, and that item was addressed in a few places. In case it isn't clear:
__ Read locks outside a Transaction are transient, held for the duration of the read operation
__ Read locks inside a Transaction are held until COMMIT/ROLLBACK
____ that is the SQL requirement, to support ACID

("MVCC" systems do none of that ... and try to desperately clean up a mountain of false versions of rows (not even pages!), that are distributed across the entire file space, when one person hits the [Save] button. And contrary to the cult hymn, it produces conditions that cannot be resolved, and resorts to "2PL". And yet still produces failures akin to deadlock. But the cult love their denial, and just sing the hymn louder.)

As stated in:
____ https://www.softwaregems.com.au/Documents/Article/Sybase%20Lock%20Manager/Sybase%20Lock%20Manager.pdf

Cheers
Derek
Message has been deleted

Nicola

unread,
Jun 22, 2021, 6:35:05 AM6/22/21
to
On 2021-06-19, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> __________________
> 1. VALIDATE Block
> Code:
> • Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
> __• RETURN on any failure.
> • Check NOT_EXISTS all rows that must not exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
> __• RETURN on any failure.
> • Transaction NOT yet started
> • All locking is transient and very short duration
> • TransactionLog is not touched
> • Fall into ...

Ok.

> __________________
> 2. EXECUTE Block
> * BEGIN TRAN
> • TransactionLog activity commenced for this xact
> • Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
> __• ROLLBACK & RETURN on any failure.
> __• the only relevant locking starts here
> __• if blocked by another xact, you are blocked here (NOT after the first verb below)
> __• the longest wait is for the longest xact in the system that touches the page
>
> So here we code each SELECT as:
> __ IF EXISTS (
> ____ SELECT ...
> ______ FROM Hotel
> ______ WHERE ...
> ______ { FOR UPDATE | HOLDLOCK } -- platform-specific syntax, “beyond” SQL
> ____ )
>
> __• <-[2.1]
>
> • Check NOT_EXISTS all rows that must not exist, in order for the
> intended rows (INSERT/UPDATE/DELETE) to succeed.
> __• ROLLBACK & RETURN on any failure.
> • (INSERT/UPDATE/DELETE)
> __• ROLLBACK & RETURN on any failure.
>
> • COMMIT TRAN
> • TransactionLog is released.

Ok.

> ___________________
> 3. CLEAN-UP Block
> • All error handling
> • ROLLBACK TRAN (if structured, the only ROLLBACK)
> • <-[3.1]
> • RETURN (if structured, the only RETURN)

Ok.

> ______
> Client
> There is ordinary interplay between the Client and the Server. The
> client may now be a GUI or a webpage or middleware in an app server.
>
> The VALIDATE block [1] is repeated in the client.

I assume that this means validating data without interacting with the
database, e.g., to check that the input data is within valid ranges,
etc.

>> how do you ensure that
>> nothing happens between the instant VALIDATE completes and the instant
>> BEGIN TRAN is executed?
>
> That duration between instants is:
> a. not relevant [assuming you understand the above], and
> b. microseconds, because it is between two consecutive steps inside
> a contiguous stored proc, which is running as object + plan code
> (compiled and a full resource plan established), and

That guarantees that interference is unlikely, but not impossible.
Otherwise, you would not need this:

> * BEGIN TRAN
> • TransactionLog activity commenced for this xact
> • Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
> __• ROLLBACK & RETURN on any failure.

> c. harmless because no resources are held.

That is an interesting approach. It is not obvious, IMO, that validating
the data the way you suggest is advantageous (I am not saying that it is
not, mind you!). Whether it is or not, it seems to depend on a number of
factors, such as number of concurrent transactions, duration of
transactions, number of touched resources, etc.

Validation introduces an overhead, but it prevents transactions to start
and acquire locks unless there is a very good chance to acquire them
all, so overall it may be a win.

To summarize, you could implement transactions in at least three ways:

(1) just BEGIN TRAN; INSERT/UPDATE/DELETE; END TRAN:
(2) BEGIN TRAN; SELECT FOR UPDATE; INSERT/UPDATE/DELETE; END TRAN;
(3) VALIDATE, then (2).

And in your experience (3) is the best approach.

> Batch Transaction
>
> Add:
>
> 2.1 Add the data-level lock
> ___ BEGIN TRAN
> ___ UPDATE HotelChainIsDeleted[ @HotelChain ] = 1
> ___ COMMIT TRAN

Ok.

> 3.1 Remove the data-level lock (in our example this ois not required, because it is the last row deleted).
> ___ BEGIN TRAN
> ___ UPDATE HotelChainIsDeleted[ @HotelChain ] = 0
> ___ COMMIT TRAN

Ok.

> You may consider the BEGIN-COMMIT TRAN superfluous. No, They are
> demanded by the standard. Further, they are anchors for code
> investigation, and on the server vis-a-vis monitoring metrics.

Ok.

>> 1. What you are describing is built on top of the standard locking
>> mechanism provided by the DBMS.
>
> If you mean the normal ACID Transaction in the full OLTP Context, no.
> It is aware of, and implicitly uses the locking in the server
> (whatever that is).
>
> If you mean the Batch Transaction, yes. I said so at the start. That
> is why I call it a data-level lock, it has to be done in data, we have
> no direct access to the locks, and we never do idiotic things such as:
> __ LOCK TABLE

Ok, that is all pretty clear now.

>> OLAP Reports
> Why do you worry about locks in OLAP reports ?
> Who cares.
> Read locks are extremely short, and transient (held for the duration
> of the read operation, not the Transaction). We can’t control Read
> locks, even implicitly

Are you saying that OLAP queries run in a lower isolation level (say,
READ COMMITTED)?

> But still, that is not Optimistic Locking. And in this post, there is
> nothing about Optimistic Locking.

Agreed.

Thanks,
Nicola
Message has been deleted
Message has been deleted

Derek Ignatius Asirvadem

unread,
Jun 23, 2021, 2:50:03 AM6/23/21
to
Nicola

> On Tuesday, 22 June 2021 at 20:35:05 UTC+10, Nicola wrote:
> > On 2021-06-19, Derek Ignatius Asirvadem wrote:

> > ______
> > Client
> > There is ordinary interplay between the Client and the Server. The
> > client may now be a GUI or a webpage or middleware in an app server.
> >
> > The VALIDATE block [1] is repeated in the client.
> I assume that this means validating data without interacting with the
> database, e.g., to check that the input data is within valid ranges,
> etc.

No.
Any GUI or input system has to do those checks on input fields, that is fundamental, not usually mentioned.

Let’s say there are 20 WIDGETS currently in the inventory:
__ why would you allow an user to purchase 6 WIJJETS [that do not exist] ???
__ Why would you allow an user to purchase 100 WIDGETS [that are not in stock] ???

What does normal GUI interaction mean to you ? Why does it exclude database interaction ? Why would you NOT restrict the user to choosing only those Parts that do exist in the database, and to a purchase quantity that is currently in stock ? Otherwise the GUI is insane, the user interaction is insane.

For decades, GUIs have had Drop-Down fields, which are obtained from the db upon launching the dialogue. And fields that are searched-and-filled as the user types a few characters, which are obtained from the db.

Why would you design a GUI that does NOT interact with the db in order to validate its intent ... before it executes its intent ???

Why would you let users enter any old nonsense all the way through the GUI; then at the end try it all against the database (the first failure would return, the second and subs would not even get checked); then find out that it failed; then try the whole rigmarole again ???

Why would you NOT check for validity at each and every field (as the user progresses through the fields) ??? Did you not genuflect to the Commandment ? Why is the Client (GUI or whatever) excluded from being part of the OLTP system, and therefore subject to OLTP Standards ?

Why would you allow an user to even dream about a Part that you do not have in the inventory [WIJJET] ?

By:
> > The VALIDATE block [1] is repeated in the client.
I mean /the VALIDATE block [1] is repeated in the client/. All of this, (modified Slightly for the GUI Context):

>>
__________________
1. VALIDATE Block
__ Proper Preparation Prevents Poor Performance.

>>>>
We know we are locking rows, implicitly, by virtue of the SQL commands we issue, but it is the server that does the locking and the resolution.
• We know that Reads are locked for the duration of the Statement, Read Locks are transient.
• This is interactive, thus duration cannot be controlled, thus no locks held
<<<<

Code:
• Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• On any failure, error message; clear field; allow re-entry.
• Check NOT_EXISTS all rows that must not exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• On any failure, error message; clear field; allow re-entry.
• Transaction NOT yet started
• No locks held
• TransactionLog is not touched
<<

When complete (eg. hitting the <Save> button), execute the Transaction stored proc.

Therefore, the Xact call when issued at the end of the GUI interaction has been VALIDATED before the attempt. While the user was fiddling around (between the presentation of the GUI dialogue and the instant the <Save> button was pressed, no rows are locked; no contention is inflicted on other users, due to invalid attempts or worse, transactions partly executed before being rolled back.

If that is NOT done in the Client, the user is encumbered with entering any old nonsense; attempting it against the database (via the Transaction only); finding out that something or other failed (one at a time); and retrying; retrying; retrying. The notion is ridiculous.

> >> how do you ensure that
> >> nothing happens between the instant VALIDATE completes and the instant
> >> BEGIN TRAN is executed?
> >
> > That duration between instants is:
> > a. not relevant [assuming you understand the above], and
> > b. microseconds, because it is between two consecutive steps inside
> > a contiguous stored proc, which is running as object + plan code
> > (compiled and a full resource plan established), and
>
> That guarantees that interference is unlikely, but not impossible.

(Well, if it “guarantees that interference is unlikely” then it is impossible, if it is “not impossible” then it is not a guarantee.)

So what. The code is interrupted at that point. It is a normal possibility, catered for: if the code segment is interrupted at that or any other point, the Transaction is rolled back and an error status is returned to the caller. At that particular point, no rows locked; Transaction not started. If at a later point, Transaction started and partially complete, then rolled back. So what. Ordinary fare. Why is that point of possible failure more important than any other point, why do we need to “ensure nothing happens between end-VALIDATE and BEGIN-TRAN ?

> Otherwise, you would not need this:
> > * BEGIN TRAN
> > • TransactionLog activity commenced for this xact
> > • Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
> > __• ROLLBACK & RETURN on any failure.

Huh ?
I need that because I love God and His Commandments; because I do not attempt actions that will fail; I am sociable and do not inflict resource-locking on other users unless I really have to. I need to separate the VALIDATE & EXECUTE blocks, because the VALIDATE blocks prepares but does not change anything, and the EXECUTE block performs the changes in a structured Transaction, only if the VALIDATE block succeeds.

OTOH, if you are concerned about data in the database changing between the point in the VALIDATE block that checks a particular row, and the point the EXECUTE block that updates the same particular row, then yes, that is not catered for in the above description. However, that [and one other major problem, that can be eliminated) is covered by Optimistic Locking.

> > c. harmless because no resources are held.
> That is an interesting approach. It is not obvious, IMO, that validating
> the data the way you suggest is advantageous (I am not saying that it is
> not, mind you!).

And there is a VALIDATION being done in the Client, the duration of which cannot be controlled due to user interaction.

It is not merely advantageous, which of course it is, it is demanded by the OLTP Standard.

> Whether it is or not, it seems to depend on a number of
> factors, such as number of concurrent transactions, duration of
> transactions, number of touched resources, etc.

Is not every request to the server “dependent” on those same factors ??? DO you have a method of calling the server such that those loads do NOT affect the Transaction ??? Maybe the MVCC fantasy of a private version of the entire database. No, this is real life.

The duration of transactions is minimised, to the absolute minimu, by the OLTP Standard. In case it is not clear, unlike the MVCC asylum:
a. we do not hold locks outside a Transaction, and
b. we do not BEGIN TRAN except to execute a Transaction, and
c. we do not allow user interaction after a BEGIN TRAN.

It is advantageous is all circumstances. The more active and contentious the system is, the more advantageous the OLTP Standard is.

> Validation introduces an overhead, but it prevents transactions to start
> and acquire locks unless there is a very good chance to acquire them
> all, so overall it may be a win.

Yes.
I would not call it an “overhead”.
Checking the depth of the pond before diving in, or checking that a Part exists before ordering it, is not “overhead”, it is a demanded part of the task.

> To summarize, you could implement transactions in at least three ways:
>
> (1) just BEGIN TRAN; INSERT/UPDATE/DELETE; END TRAN:
> (2) BEGIN TRAN; SELECT FOR UPDATE; INSERT/UPDATE/DELETE; END TRAN;
> (3) VALIDATE, then (2).
>
> And in your experience (3) is the best approach.

Can you suggest any other method, that locks the smallest number of resources, AND locks said resources for the shortest duration, AND only when those resources are sure to be touched ?

No, not merely my experience.
a. the way it was done, and the way it is still done, on “legacy” mainframe systems.
b. the way it is done on genuine OLTP systems (otherwise they have no right to the /OLTP/ label). These are well-known and published methods. Even for the totally ignorant, upon using a genuine OLTP server, they would read it in the manuals, the same way that the insanity of MVCC is discussed in the PusGres manuals, but with honesty not fantasy. Ie. specific methods for reducing contention within the particular product are given.

As stated severally, about important matters (eg. God; truth) or about science (real science not the Date;Darwen; Fagin; etc filth that passes for Modern “science”), which is anchored in objective truth, I don’t have opinions, I can only provide education.

[1][2] would be a very stupid app, that allows the user keep banging away at things that do not exist.

Yes, it is my experience since 1976. By the Grace Of God, working for system hosting companies; then the Canadian Trotting Registry; then Cincom, the fifth largest DBMS provider at the time. Meaning that I learned what NOT to do, and what TO DO CORRECTLY. Truth is permanent, it cannot change ... if it changes, it is not truth. The OLTP Standard has not changed since the 1960’s. I have just rendered it in an SQL context.

>>>>
Btw, neither ACID Transaction nor Transaction definition was invented by Jim Gray. No thanks, we had all that long before he was heard of. What he did do was standardise the Transaction Benchmark process, such that the benchmarks performed by the vendors on their different SQL Platforms were the same code structure, and could be compared by customers and prospects, the Transaction Processing Performance Council. Obviously Oracle cheated like a bachelor party at a brothel, and had to be reined in. In so doing, he coined the term ACID, for the Transactions that we did have. What he did do was good enough, he does not need to Big Note himself further (common problem with academics).

Eg. Because he was standardising from the user perspective, for millions of TPPC transactions per second, and not getting into the problems of Transaction Quality or Transaction Complexity, re the issues raised in this thread, he remained as clueless as the academics of today. If that were not the case, he would have published the OLTP Standard as his own.
<<<<

[3] is not best. And it certainly fails the OLTP requirement, which is a system wide Standard. You need:
4. VALIDATE all intended changes -->IN THE CLIENT<-- during user interaction
__ upon <Save> execute the Transaction sp
____ which performs [3]
______ which performs [2]

+----------+

> >> OLAP Reports
> > Why do you worry about locks in OLAP reports ?
> > Who cares.
> > Read locks are extremely short, and transient (held for the duration
> > of the read operation [Statement], not the Transaction). We can’t control Read
> > locks, even implicitly
>
> Are you saying that OLAP queries run in a lower isolation level (say,
> READ COMMITTED)?

Of course. That is what that ISOLATION LEVEL[ READ COMMITTED ] is there for.

Not just OLAP queries, but all SELECTs outside a Transaction. That [ISOLATION LEVEL] is an ANSI SQL requirement, within the ACID requirement. We exclude [wait for completion of] uncommitted changes, but we do not need SERIALISED, in fact we want massively parallel, and we want the whole result set to be integral within itself (all internal dependencies intact and true).

Did you not read the link I gave for the Sybase Lock Manager ? Page 2, bottom left, /Resolution/ table. I do not refer you to some doc unless it is directly related to the item at issue in the post.

In the Sybase world, for decades, we do not bother with REPEATABLE READ, it defaults to SERIALISABLE, which although it is a higher level of ISOLATION, works perfectly. In 2000, with the introduction of DRL/DPL data structures to provide improved speed for RFS (SAP is RFS), yes, they implemented REPEATABLE READ.

For understanding. Many systems in the MS world do their OLAP queries with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the result set has no integrity, and (b) contains uncommitted changes which may disappear.

For the MVCC world, they have no concept of READ COMMITTED/UNCOMMITTED, because they have only the false privatised version of the entire database, firmly secured between their big toes, which has only nominally “committed” changes, that has no relation to the version of other users. That nominally “committed” data blows up at COMMIT TRAN, just check the PooGres manuals re all the problems that are not supposed to happen in the Stonebraker fantasy, that happen, even to people who dutifully sing the mantra.

> > But still, that is not Optimistic Locking. And in this post, there is
> > nothing about Optimistic Locking.
>
> Agreed.

Add Optimistic Locking, and the methods given in this thread would be complete.

Cheers
Derek

Nicola

unread,
Jun 23, 2021, 5:59:50 PM6/23/21
to
On 2021-06-22, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> OTOH, if you are concerned about data in the database changing between
> the point in the VALIDATE block that checks a particular row, and the
> point the EXECUTE block that updates the relevant row, then yes, that
> is not catered for in the above description. However, that [and one
> other condition) is covered by Optimistic Locking.

Yes, that is what I meant.

>> >> OLAP Reports
>> > Why do you worry about locks in OLAP reports ?
>> > Who cares.
>> > Read locks are extremely short, and transient (held for the duration
>> > of the read operation [Statement], not the Transaction). We can’t control Read
>> > locks, even implicitly
>>
>> Are you saying that OLAP queries run in a lower isolation level (say,
>> READ COMMITTED)?
>
> Of course. That is what that ISOLATION LEVEL[ READ COMMITTED ] is there for.
>
> Not just OLAP queries, but all SELECTs outside a Transaction. That
> [ISOLATION LEVEL] is an ANSI SQL requirement, within the ACID
> requirement. We exclude [wait for completion of] uncommitted changes,
> but we do not need SERIALISED, in fact we want massively parallel, and
> we want the whole result set to be integral within itself (all
> internal dependencies intact and true).

What does "integral within itself" mean? Transactions at READ COMMITTED,
even if they are read-only, suffer from some anomalies.

> For understanding. Many systems in the MS world do their OLAP queries
> with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
> result set has no integrity, and (b) contains uncommitted changes
> which may disappear.

Correct. But queries at READ COMMITTED may view (committed) changes that
were not there when the transaction started. Do you accept that query
results may be off sometimes, or do you have some recipe to make them
avoid inconsistent reads and phantoms?

> For the MVCC world, they have no concept of READ
> COMMITTED/UNCOMMITTED, because they have only the false privatised
> version of the entire database, firmly secured between their big toes,
> which has only nominally “committed” changes, that has no relation to
> the version of other users. That nominally “committed” data blows up
> at COMMIT TRAN, just check the PooGres manuals re all the problems
> that are not supposed to happen in the Stonebraker fantasy, that
> happen, even to people who dutifully sing the mantra.

According to the manual
(https://www.postgresql.org/docs/current/transaction-iso.html), in
PostgreSQL each isolation level correctly prevents the anomalies that
are not allowed by the standard to occur at that level, and in some
cases it prevents more (e.g., READ UNCOMMITTED is like READ COMMITTED).
What are you referring to when you mention "all the problems that are
not supposed to happen... that happen"?

>> > But still, that is not Optimistic Locking. And in this post, there is
>> > nothing about Optimistic Locking.
>>
>> Agreed.
>
> Add Optimistic Locking, and the methods given in this thread would be complete.

Ok.

Nicola

Derek Ignatius Asirvadem

unread,
Jun 24, 2021, 3:38:28 AM6/24/21
to
Nicola

> On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
> > On 2021-06-22, Derek Ignatius Asirvadem wrote:

Sorry. Before I respond to the content, I need one clarification.

> > For understanding. Many systems in the MS world do their OLAP queries
> > with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
> > result set has no integrity, and (b) contains uncommitted changes
> > which may disappear.
>
> Correct. But queries at READ COMMITTED may view (committed) changes that
> were not there when the transaction started.

What “transaction started” ??? No transaction is involved.

The report connection would normally do SELECT [at] READ_COMMITTED which holds ReadLocks for the duration of the Statement, but in this abnormal case does SELECT [at] READ_UNCOMMITTED, holding no locks at all, and reading whatever is there on the disk, including uncommitted changes.

You might be stuck in the mindset that in order to read at { READ_UNCOMMITTED | READ_COMMITTED, } you must open a Transaction. No, that is the MV-non-CC insanity, the Stonebraker fantasy. No, that is not ACID. No, we do not do that.

As per ACID, the real ACID, not the perverted form that MV-non-CC pretends, we only need to open a Transaction for { REPEATABLE_READ | SERIALISABLE }.

(I have to stretch, and be very careful, when I am writing these things ... from my everyday position of never having those problems; that insanity; that mindset. The insanity is foreign to me, but I am aware of it. Otherwise I would make serious errors in understanding, and communication would be difficult (correcting simple errors). Please understand, you have to do the converse: from your everyday MV-non-CC insanity and fantasy, you need to think carefully about the Real ACID; Real Ordinary Locking; Real Transactions. Not the Straw Men, which are burning slowly.)

----------

Separately. What “rows that were not there [when the transaction started]” ??? How can a transaction change a row that is not there ??? Or even have knowledge about a row that is not there ???

Cheers
Derek

Nicola

unread,
Jun 24, 2021, 5:33:00 AM6/24/21
to
On 2021-06-24, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
> Nicola
>
>> On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
>> > On 2021-06-22, Derek Ignatius Asirvadem wrote:
>
> Sorry. Before I respond to the content, I need one clarification.
>
>> > For understanding. Many systems in the MS world do their OLAP queries
>> > with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
>> > result set has no integrity, and (b) contains uncommitted changes
>> > which may disappear.
>>
>> Correct. But queries at READ COMMITTED may view (committed) changes that
>> were not there when the transaction started.
>
> What “transaction started” ??? No transaction is involved.

Ok, now that is clear.

> The report connection would normally do SELECT [at] READ_COMMITTED
> which holds ReadLocks for the duration of the Statement,

Ok. So, I am back to a question I have already posed to you:

- OLAP queries run at READ COMMITTED;
- Each OLAP query is a single SELECT statement;
- Each SELECT holds read locks for the duration of the statement.

But the duration of the statement may be significant (minutes, hours,
...) Don't such queries create contention (other transactions cannot
update the data read by the OLAP query until the query is over)?

> Separately. What “rows that were not there [when the transaction
> started]” ??? How can a transaction change a row that is not there
> ???

Of course it can't.

> Or even have knowledge about a row that is not there ???

Of course it can't. But it can see something that was not there before,
and it appears at a later time. I explain what I mean with an example:

Time Transaction Operation
0 T1,T2 start transaction;
1 T1 select sum(X) from R;
2 T2 insert into R(X) values (100);
3 T2 commit;
4 T1 select sum(X) from R;

At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
time 2, even if the record did non exist when T1 began. The SELECT at
time 1 and time 4 would then return different results.

Of course, this is not an issue if the only queries you run at READ
COMMITTED are single statements—provided that the system guarantees
statement-level consistency (it seems that some don't:
https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).

Nicola

Daniel Loth

unread,
Jun 24, 2021, 7:10:22 AM6/24/21
to
Hi Nicola,

If you're running a query in the READ COMMITTED isolation level then a shared lock is acquired for the duration of the read operation on whatever resource the lock protects. That resource may be a row or a page.
If you have acquired a row-level lock then the lock will be released once reading the row has finished.
If you have acquired a page lock then the lock will be released once reading the page has finished.

A query running for hours under READ COMMITTED would be locking and unlocking rows or pages as it encounters and processes them.

As for your second part concerning the sequence of events described, you are quite right. This is precisely the phenomenon that the REPEATABLE READ isolation protects against.
When you run a query with REPEATABLE READ isolation, shared locks are acquired and held for rows / pages until the transaction is committed or rolled back.
So if you then tried to modify data in those rows or pages, which involves acquiring an exclusive lock for those rows or pages, it would block - the exclusive lock being incompatible with the shared lock - thus allowing you to repeat the REPEATABLE READ and get the same result.

However, it's important to note that REPEATABLE READ isolation level does not acquire range locks (or predicate locks). This means that you can conceivably insert a new record that satisfies your REPEATABLE READ query's WHERE clause predicate.
For example, you could insert a new record that satisfies the REPEATABLE READ query predicate but ends up stored on a page that is not subject to a shared lock (the page might not have existed when you first ran the REPEATABLE READ query).
This is a phantom record, which SERIALIZABLE guards against using the aforementioned range (predicate) locks.

It's important to recognise that your REPEATABLE READ query - or any query for that matter - is executing in the context of a broader, shared, system. Seeing the outcome of newly committed transactions makes sense if you've chosen READ COMMITTED and understand the concurrency phenomenon that it protects against.

Importantly, and often overlooked as far as MVCC disadvantages go, the data you read in an hours-long query running under MVCC is arbitrarily old data. If your query takes 2 hours to complete then the very last row processed by your query is potentially 2 hours out of date.

Daniel

Derek Ignatius Asirvadem

unread,
Jun 24, 2021, 7:56:23 AM6/24/21
to
Nicola

> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
> >
> > The report connection would normally do SELECT [at] READ_COMMITTED
> > which holds ReadLocks for the duration of the Statement,
>
> Ok. So, I am back to a question I have already posed to you:
>
> - OLAP queries run at READ COMMITTED;
> - Each OLAP query is a single SELECT statement;

1. Well, I guarantee that in my databases, because it is 100% Predicates.
Even high-end massive report tools do that, noting the intermediate “universe” that maps an RFS into something readable. And then it executes a single SELECT.
But boffins writing SQL on a non-SQL suite of programs, do not have a hope in hell. So I can see that what is vanilla to us, is news to you.

2. Not only OLAP, any SELECT runs at READ COMMITTED.
OLAP query do not *have* to be a single SELECT statement

> - Each SELECT holds read locks for the duration of the statement.

Yes.
SQL/ACID requirement, that MV-non-CC suites cannot support.

> But the duration of the statement may be significant

Why ???

I don’t have a single report in my entire SQL life that exceeds 12 seconds, and some of them are “complex” (20+ tables; subqueries; derived tables) and massive (tables have billions of rows). I kinda pride myself on sub-second response times, even for reports.

In the course of my operations, sure, I have replaced reports that run in 30 mins with a Rdb+single-SELECT that runs in under 10 seconds. Many, many times.

>>>>
On one occasion I got thrown out of the building by the big boss because his report that ran in 15 mins on Friday, after I re-built the DataStructures on the server, ran in under 1 second on Monday. He was sure that my project failed, that the report was wrong. I begged him to compare the report content, but he would not. I had to buy his secretary a box of Lindt chocolate, get her to print out both Friday and Monday reports, and get her to show them to him personally. The ungrateful sob did not even apologise, he just had security re-instate my security pass.
<<<<

I accept that in the MS world, where they are used to longer times, sure, their reports would go to 90 mins. I have replaced 2 of those and brought it down to under 10 secs.

> (minutes, hours, ...)

Not in the commercial SQL world.

Sure, in idiotic suites of programs, written by 10,000 undergrads spread across the planet, all singing the Stonebraker mantra to maintain their collective fantasy, hours and days are “normal”. That is the price of NOT having an architecture, of writing filth and pretending that it is a server. 1,000 or 2,000 TIMES slower than a commercial SQL Platform is “normal”.

So no, the issue simply does not apply to us. But I accept that is “normal” for the academics and freeware users. Just wait until you get to the benchmark that you were planning.

Take a look at this Benchmark I did against Oracle (MV-non-CC). Where Sybase returned in 2.16 secs, Oracle had to abandon the benchmark after 120 MINUTES.
__ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf

> Don't such queries create contention (other transactions cannot
> update the data read by the OLAP query until the query is over)?

Correct.
So what (see above).
No one cares about waiting a few seconds.

1. Did you not read the /Sybase Lock Manager/ link, page 2 table at bottom left ?

2. I have stated that if you want to get into this, there is a Server Monitoring report that just happens to expose this problem, and the considerations, and with more granularity than “contention” ... but you have not taken it up.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf

The Read Locks are precisely this, SELECT queries (not OLAP, but that makes no difference) contending with OLTP transactions. Do you not care, that after fixing a SAN Config problem, the Read Locks dropped by 91%, or what that means at the coal face. Or stated in converse, why Read Locks were 1,100% higher when the SAN was mis-configured.

In any case, take an extremely quick look at the report, in [§ Selection] at the top, just three lines:
__ Write Locks = up 10%
__ Read Locks = down 91%
__ Commits = no change (confirming the load is the same)

At minimum, I am confirming that (a) OLAP & SELECT queries *DO* hold Read Locks for the duration of the Statement, (b) that that is so fleeting; so irrelevant, in a normal commercial SQL server (Server Architecture).

> > Separately. What “rows that were not there [when the transaction
> > started]” ??? How can a transaction change a row that is not there
> > ???
>
> Of course it can't.
>
> > Or even have knowledge about a row that is not there ???
>
> Of course it can't. But it can see something that was not there before,
> and it appears at a later time. I explain what I mean with an example:
>
> Time Transaction Operation
> 0 T1,T2 start transaction;
> 1 T1 select sum(X) from R;
> 2 T2 insert into R(X) values (100);
> 3 T2 commit;
> 4 T1 select sum(X) from R;
>
> At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
> time 2, even if the record did non exist when T1 began. The SELECT at
> time 1 and time 4 would then return different results.

Sure. But that is the classic demonstration for requiring ISOLATION LEVEL REPEATABLE_READ, which requires a Transaction to be open.

1.
It is still a matter of understanding on your part.
(The PusGres link you gave is pure mind-numbing filth. You are still thinking about a SELECT that holds “transactional” locks. It does not (except in that oozing filth). )
Change all occs of [T1] to [S1], signifying that it is a SELECT, not a Transaction.
Remove T1 at Time[0] because we do not start a Transaction for a SELECT
If READ_COMMITTED is used (default on DB2; Sybase; MS), a SELECT [no Transaction]:
__ at Time[1] does not obtain the inserted row
__ at Time[4] obtains the inserted row

A SELECT does not have to be repeated, period, full stop, end of story.

2.
But if a SELECT does get repeated, then the context must be some Logical Unit of Work; a Transaction.
Since it is now a Transaction, ISOLATION LEVEL SERIALIZABLE applies (you do not get to choose).
(Using your original example, not the changes ala [1])
Add Time[5] T1 commit
The result is, T2 does not start until Time[5].
The inserted row does not get inserted until the T1 Transaction completes.

It would be an exceedingly stupid thing to do, repeating a SELECT in a Transaction, instead of saving the value at the first SELECT. But academically, as a theoretical possibility, sure. And SQL/ACID handles it, no problem at all.

> Of course, this is not an issue if the only queries you run at READ
> COMMITTED are single statements—provided that the system guarantees
> statement-level consistency

Yes.
(Although the example does not suffice, I do know what you mean.)
Come on.

1. Can you not tell the difference between a platform that provides Statement level consistency, and has done so for 3 decades (at the time of the question), and a bug, that in some cases it fails ? A bug does not prove that Statement-level consistency is not provided by the product, it proves only the existence of a bug. You are grasping at straws.

2. Both the seeker and the two responders are idiots. They are genuflecting to the (then) new MV (definitely no CC in that, but the pre-existing CC in the pre-existing “1PL”), the awe of the mystical and magical. With no understanding whatsoever.

3. The answer is actually quite wrong for Sybase & DB2, and MS/SQL (I seriously doubt that MS/SQL has changed for the answer to be correct).
> At read committed level shared locks are released as soon as the data is read (not at the end of the transaction or even the end of the statement)
No, at read committed level shared locks are released at the end of the statement.
The reference to “end of transaction” is stupid, good for confusing the confused further.

Cheers
Derek

Nicola

unread,
Jun 25, 2021, 4:17:37 PM6/25/21
to
On 2021-06-24, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
>> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
>> >
>> > The report connection would normally do SELECT [at] READ_COMMITTED
>> > which holds ReadLocks for the duration of the Statement,
>>
>> Ok. So, I am back to a question I have already posed to you:
>>
>> - OLAP queries run at READ COMMITTED;
>> - Each OLAP query is a single SELECT statement;
>
> 1. Well, I guarantee that in my databases, because it is 100% Predicates.
> Even high-end massive report tools do that, noting the intermediate
> “universe” that maps an RFS into something readable. And then it
> executes a single SELECT.

I did not mean to imply otherwise. I am perfectly aware of how
expressive SQL can be on well-designed databases.

> 2. Not only OLAP, any SELECT runs at READ COMMITTED.

Yes. I focused on OLAP to emphasize that I have "complex" queries in
mind.

>> - Each SELECT holds read locks for the duration of the statement.
>
> Yes.
> SQL/ACID requirement,

Fine.

>that MV-non-CC suites cannot support.

Trivially, as MVCC does not use locks.

>> But the duration of the statement may be significant
>
> Why ???
>
> I don’t have a single report in my entire SQL life that exceeds 12
> seconds, and some of them are “complex” (20+ tables; subqueries;
> derived tables) and massive (tables have billions of rows). I kinda
> pride myself on sub-second response times, even for reports.

Fair (and funny anecdote you have there).

>> (minutes, hours, ...)
>
> Not in the commercial SQL world.
>
> Sure, in idiotic suites of programs, written by 10,000 undergrads
> spread across the planet, all singing the Stonebraker mantra to
> maintain their collective fantasy, hours and days are “normal”. That
> is the price of NOT having an architecture, of writing filth and
> pretending that it is a server. 1,000 or 2,000 TIMES slower than
> a commercial SQL Platform is “normal”.
>
> So no, the issue simply does not apply to us. But I accept that is
> “normal” for the academics and freeware users. Just wait until you
> get to the benchmark that you were planning.
>
> Take a look at this Benchmark I did against Oracle (MV-non-CC). Where
> Sybase returned in 2.16 secs, Oracle had to abandon the benchmark
> after 120 MINUTES.
> __ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf

I remember finding that benchmark some time ago and trying to
reproduce it in PostgreSQL, based on the DDL I found here:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/DDL%20101201.txt

and the query found here:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt

My only noteworthy remarks at the time were:

1. As bad as Oracle might be, it can't be so slow. It sounds like the
benchmark was performed by Sybase people and/or no effort was made to
optimize on the Oracle side.

Anyway, I don't care about Oracle.

2. 488MB for the 2M records of CustomerTransaction seems quite a lot.
I've got ~147MB, including indexes (well, one index), in PostgreSQL.

I have just repeated my experiment. The above query takes ~2.4s with
cold data and ~800ms with hot data on my laptop. Ok, that's flash
storage and a laptop one decade newer than the one in your document. But
it's stock PostgreSQL with default parameters (which are extremely
conservative), with no physical tuning applied. So, all in all it's not
so bad for the price.

For the sake of completeness, and possibly for comparison and further
discussion, at the end of this message I report my benchmark.

>> Don't such queries create contention (other transactions cannot
>> update the data read by the OLAP query until the query is over)?
>
> Correct.
> So what (see above).
> No one cares about waiting a few seconds.

There are many applications where waiting more than a few hundreds
milliseconds might be problematic. For example, waiting ten seconds
makes a web app unacceptably slow.

> 1. Did you not read the /Sybase Lock Manager/ link, page 2 table at bottom left ?

Yes.

> 2. I have stated that if you want to get into this, there is a Server
> Monitoring report that just happens to expose this problem, and the
> considerations, and with more granularity than “contention” ... but
> you have not taken it up.
> __ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
>
> The Read Locks are precisely this, SELECT queries (not OLAP, but that
> makes no difference) contending with OLTP transactions. Do you not
> care, that after fixing a SAN Config problem, the Read Locks dropped
> by 91%, or what that means at the coal face. Or stated in converse,
> why Read Locks were 1,100% higher when the SAN was mis-configured.

I have no doubt that, given your experience, you are able to achieve
great improvements on every system you touch. My questions are for
better understanding on my part, not to tell you that you should run
things differently.

Nicola

--------------------------------------------------------------------------------
--
-- Benchmark for Tony Andrews: DDL for Tables Used
-- PerformanceDBA 28 Nov 2010
-- Adapted by NV for PostgreSQL on 2021 Jun 25
--
-- Original code from:
--
-- https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/

-- NOTE: data types are inferred (not given in the original code).
create domain CustomerId as integer;
create domain _DescriptionSmall as text;
create domain _Description as text;
create domain "_Date" as date;
create domain _UserStamp as timestamp;
create domain _TimeStampSmall as timestamp;
create domain _IntSmall as smallint;
create domain _MoneySmall as numeric(7,2);
create domain TransactionCode as char
constraint valid_transaction_code
check (value in ('C','D'));

-- Population: 1000 Random CustomerIds
create table REF_Customer (
CustomerId CustomerId not null , -- SMALLINT
ShortName _DescriptionSmall not null ,
FullName _Description not null ,
CreatedDate "_Date" not null ,
UserStamp _UserStamp not null ,
TimeStamp _TimeStampSmall not null
);

alter table REF_Customer
add constraint U_CustomerId
primary key (CustomerId);

alter table REF_Customer
add constraint UC_ShortName
unique (ShortName);

-- Population: 2M rows
-- distributed evenly across the 1000 CustomerIds
-- 2000 per CustomerId: 1000 Credits; 1000 Debits
create table CustomerTransaction (
CustomerId CustomerId not null , -- SMALLINT
SequenceNo _IntSmall not null ,
Date "_Date" not null ,
TransactionCode TransactionCode not null , -- CHAR(1)
Amount _MoneySmall not null
);

alter table CustomerTransaction
add constraint UC_PK
primary key (CustomerId, SequenceNo);

-- Insert some randomly generated data, distributed as specified:
with words(word) as (
select unnest(string_to_array(pg_read_file('/usr/share/dict/words')::text,E'\n'))
)
insert into REF_Customer(CustomerId, ShortName, FullName, CreatedDate, UserStamp, TimeStamp)
select row_number() over (),
word as ShortName,
word || word || word || word as FullName,
(now() - '1 day'::interval * round(random() * 3650))::date as CreateDate,
now() - '1 day'::interval * round(random() * 3650) as UserStamp,
now() - '1 day'::interval * round(random() * 3650) as TimeStamp
from words
order by random()
limit 1000;

-- Insert 2M rows, 2000 per customer, uniformly distributed over {C,D}
insert into CustomerTransaction(CustomerId, SequenceNo, Date, TransactionCode, Amount)
select CustomerId,
generate_series(1,2000) as SequenceNo,
(now() - '1 day'::interval * round(random() * 3650))::date as Date,
case when random() < 0.5 then 'C' else 'D' end as TransactionCode,
random() * 10000 as Amount
from REF_Customer;

analyze; -- Update the stats

--------------------------------------------------------------------------------

-- Main PostgreSQL settings:
-- shared_buffers = 128MB
-- maintenance_work_mem = 64MB
-- temp_buffers = 8MB
-- work_mem = 4MB

select * from ref_customer limit 3;
┌────────────┬───────────┬──────────────────────────────────────┬─────────────┬────────────────────────────┬────────────────────────────┐
│ customerid │ shortname │ fullname │ createddate │ userstamp │ timestamp │
├────────────┼───────────┼──────────────────────────────────────┼─────────────┼────────────────────────────┼────────────────────────────┤
│ 182577 │ skippet │ skippetskippetskippetskippet │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
│ 57408 │ downness │ downnessdownnessdownnessdownness │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
│ 132294 │ orocratic │ orocraticorocraticorocraticorocratic │ 2021-06-25 │ 2021-06-25 21:00:59.401317 │ 2021-06-25 21:00:59.401317 │
└────────────┴───────────┴──────────────────────────────────────┴─────────────┴────────────────────────────┴────────────────────────────┘

select * from customertransaction limit 10;
┌────────────┬────────────┬────────────┬─────────────────┬─────────┐
│ customerid │ sequenceno │ date │ transactioncode │ amount │
├────────────┼────────────┼────────────┼─────────────────┼─────────┤
│ 182577 │ 1 │ 2017-07-08 │ C │ 71.73 │
│ 182577 │ 2 │ 2012-10-02 │ C │ 9443.97 │
│ 182577 │ 3 │ 2011-10-31 │ C │ 5950.48 │
│ 182577 │ 4 │ 2013-04-12 │ C │ 6720.09 │
│ 182577 │ 5 │ 2016-04-16 │ C │ 3002.87 │
│ 182577 │ 6 │ 2013-01-14 │ D │ 175.58 │
│ 182577 │ 7 │ 2015-03-30 │ D │ 567.78 │
│ 182577 │ 8 │ 2015-11-06 │ D │ 9538.73 │
│ 182577 │ 9 │ 2012-01-16 │ D │ 5603.86 │
│ 182577 │ 10 │ 2013-04-30 │ C │ 7631.24 │
└────────────┴────────────┴────────────┴─────────────────┴─────────┘

-- Adapted from https://wiki.postgresql.org/wiki/Disk_Usage
select table_name, pg_size_pretty(total_bytes) as total
, pg_size_pretty(index_bytes) as index
, pg_size_pretty(table_bytes) as table
from (
select *, total_bytes-index_bytes-coalesce(toast_bytes,0) as table_bytes from (
select c.oid,nspname as table_schema, relname as table_name
, c.reltuples as row_estimate
, pg_total_relation_size(c.oid) as total_bytes
, pg_indexes_size(c.oid) as index_bytes
, pg_total_relation_size(reltoastrelid) as toast_bytes
from pg_class c
left join pg_namespace n on n.oid = c.relnamespace
where relkind = 'r'
and relname in ('ref_customer', 'customertransaction')
) a
) a;
┌─────────────────────┬────────┬────────┬────────┐
│ table_name │ total │ index │ table │
├─────────────────────┼────────┼────────┼────────┤
│ ref_customer │ 248 kB │ 112 kB │ 128 kB │
│ customertransaction │ 147 MB │ 47 MB │ 100 MB │
└─────────────────────┴────────┴────────┴────────┘

-- Disable parallel query execution
set max_parallel_workers_per_gather = 0;

-- Query adapted from https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt
explain analyze
SELECT ShortName,
(
SELECT SUM (Amount)
FROM CustomerTransaction
WHERE CustomerId = c.CustomerId
AND TransactionCode = 'C'
) as NumCredit,
(
SELECT SUM (Amount)
FROM CustomerTransaction
WHERE CustomerId = c.CustomerId
AND TransactionCode = 'D'
) as NumDebit
FROM REF_Customer c;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on ref_customer c (cost=0.00..10745234.88 rows=1000 width=74) (actual time=1.873..817.176 rows=1000 loops=1) │
│ SubPlan 1 │
│ -> Aggregate (cost=5372.61..5372.62 rows=1 width=32) (actual time=0.437..0.437 rows=1 loops=1000) │
│ -> Bitmap Heap Scan on customertransaction (cost=43.68..5370.09 rows=1006 width=6) (actual time=0.079..0.333 rows=1000 loops=1000) │
│ Recheck Cond: ((customerid)::integer = (c.customerid)::integer) │
│ Filter: ((transactioncode)::bpchar = 'C'::bpchar) │
│ Rows Removed by Filter: 1000 │
│ Heap Blocks: exact=13732 │
│ -> Bitmap Index Scan on uc_pk (cost=0.00..43.43 rows=2000 width=0) (actual time=0.075..0.075 rows=2000 loops=1000) │
│ Index Cond: ((customerid)::integer = (c.customerid)::integer) │
│ SubPlan 2 │
│ -> Aggregate (cost=5372.58..5372.59 rows=1 width=32) (actual time=0.379..0.379 rows=1 loops=1000) │
│ -> Bitmap Heap Scan on customertransaction customertransaction_1 (cost=43.68..5370.09 rows=994 width=6) (actual time=0.061..0.277 rows=1000 loops=1000) │
│ Recheck Cond: ((customerid)::integer = (c.customerid)::integer) │
│ Filter: ((transactioncode)::bpchar = 'D'::bpchar) │
│ Rows Removed by Filter: 1000 │
│ Heap Blocks: exact=13732 │
│ -> Bitmap Index Scan on uc_pk (cost=0.00..43.43 rows=2000 width=0) (actual time=0.057..0.057 rows=2000 loops=1000) │
│ Index Cond: ((customerid)::integer = (c.customerid)::integer) │
│ Planning Time: 0.209 ms │
│ Execution Time: 817.397 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Nicola

unread,
Jun 25, 2021, 4:25:28 PM6/25/21
to
On 2021-06-24, Daniel Loth <daniel...@gmail.com> wrote:
> Hi Nicola,

Hi Daniel,
great to see someone else diving in! I hope you will last :)

Thanks for your comments. In particular:

> Importantly, and often overlooked as far as MVCC disadvantages go, the
> data you read in an hours-long query running under MVCC is arbitrarily
> old data. If your query takes 2 hours to complete then the very last
> row processed by your query is potentially 2 hours out of date.

Sure, that's a fair criticism of MVCC.

But if Derek can keep all his queries under 12 seconds with an old (as
in "mature") 2PC system, don't you think that a modern MVCC system can
achieve the same or better performance under the same load?

Nicola

Nicola

unread,
Jun 25, 2021, 5:41:20 PM6/25/21
to
On 2021-06-25, Nicola <nic...@nohost.org> wrote:
> I have just repeated my experiment. The above query takes ~2.4s with
> cold data and ~800ms with hot data on my laptop.

There's a bias in my test, because, the way it is generated, the
transaction data is physically sorted by CustomerID. After correcting
the bias, I'm still around 3s anyway for "Subquery SUM", with no
physical optimization. This query performs better:

https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Inline%20SUM%20ShowPlan.txt

taking ~1.2s.

To insert data in random order:

insert into CustomerTransaction(CustomerId, SequenceNo, Date, TransactionCode, Amount)
with T as (
select CustomerId,
generate_series(1,2000) as SequenceNo,
(now() - '1 day'::interval * round(random() * 3650))::date as Date,
case when random() < 0.5 then 'C' else 'D' end as TransactionCode,
random() * 10000 as Amount
from REF_Customer
)
select * from T order by random();

Nicola

Derek Ignatius Asirvadem

unread,
Jun 25, 2021, 11:56:28 PM6/25/21
to
Nicola

Great post.

> On Saturday, 26 June 2021 at 06:17:37 UTC+10, Nicola wrote:
> > On 2021-06-24, Derek Ignatius Asirvadem wrote:
> >> On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
> >> > On 2021-06-24, Derek Ignatius Asirvadem wrote:

> > 2. Not only OLAP, any SELECT runs at READ COMMITTED.
>
> Yes. I focused on OLAP to emphasize that I have "complex" queries in
> mind.
>
> >> - Each SELECT holds read locks for the duration of the statement.
> >
> > Yes.
> > SQL/ACID requirement,
>
> Fine.
>
> >that MV-non-CC suites cannot support.
>
> Trivially, as MVCC does not use locks.

Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!

Or are you just stating that because it does not use locks, it holds no locks for the duration of the statement ? Pffft.

> > Take a look at this Benchmark I did against Oracle (MV-non-CC). Where
> > Sybase returned in 2.16 secs, Oracle had to abandon the benchmark
> > after 120 MINUTES.
> > __ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
> I remember finding that benchmark some time ago

Nice to know.

> and trying to
> reproduce it in PostgreSQL, based on the DDL I found here:
>
> https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/DDL%20101201.txt
>
> and the query found here:
>
> https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt
>
> My only noteworthy remarks at the time were:
>
> 1. As bad as Oracle might be, it can't be so slow. It sounds like the
> benchmark was performed by Sybase people and/or no effort was made to
> optimize on the Oracle side.

No, no. Of course it was me on the Sybase side. On the Oracle side it was the famous Tony Andrews himself. The Oracle world is huge compared to the Sybase world. He is a hugely famous Oracle expert. He engaged me on StackOverflow about time series data, that needed a subquery, which I said Oracle cacks itself, which lead to the benchmark.
__ https://stackoverflow.com/questions/4083464/design-database-relating-to-time-attribute
__ https://stackoverflow.com/questions/4375192/performance-of-different-approaches-to-time-based-data

On the contrary, the Oracle side was heavily optimised, both logical and physical. As noted, TA even duplicated the CustomerTransaction table, to elevate the speed. And I had to downgrade the config on the Sybase side. There were a many of those, only the huge one is mentioned in the doc.

Once he lost out on the Subquery SUM() = ABANDONED, he did not finish the remaining tests (we had agreed to 5 tests per benchmark), hence the empty slots Inline View SUM() on the spreadsheet.

Separate to the ABANDONED after 120 MINUTES, the other tests were

1 Subquery COUNT ___________ 3.0 x slower than Sybase
3 InlineView COUNT __________ 3.6 x slower than Sybase
3 InlineView COUNT Improved 4.8 x slower than Sybase
1 Subquery SUM _____________ Abandoned after 120 mins (Sybase executed in 2.16 secs)
3 InlineView SUM ____________ 26.4 x slower than Sybase
3 InlineView SUM Improved __ 36.4 x slower than Sybase

Not being honest, he tried posting falsities here. Which I had to destroy unfortunately:
__ https://groups.google.com/g/comp.databases.theory/c/IkJyS0MTzUs

> Anyway, I don't care about Oracle.

No problem. I referred to it because it is the oldest and most mature MV-non-CC, and it too, does not comply with SQL or ACID.

> 2. 488MB for the 2M records of CustomerTransaction seems quite a lot.
> I've got ~147MB, including indexes (well, one index), in PostgreSQL.

Yes, that figure is odd. Even with the duplicated table. I can’t check it now as that disk image is archived, and my current laptop is two generations newer.

> I have just repeated my experiment. The above query takes ~2.4s with
> cold data and ~800ms with hot data on my laptop. Ok, that's flash
> storage and a laptop one decade newer than the one in your document. But
> it's stock PostgreSQL with default parameters (which are extremely
> conservative), with no physical tuning applied.

Excellent. The ‘hot’ figure is the one we want (test is logical not physical I/O).

And considering the next post, re Inline SUM() at 1.2 secs.

Excellent, it is approaching Sybase performance, noting the fact that this is SSD, and the benchmark 11 years ago was on HDD.

> So, all in all it's not
> so bad for the price.

All in all ? Nah. It is hopeless for the price. No one takes responsibility. Literally the definition of insanity (the premise of the Mental Health Act is that the person is not responsible for their actions).

Put another way, with technical precision, you pay zero for non-SQL and non-ACID. And the thing that you paid nothing for has great performance in single user mode.

Now if we bring it back to the context of locking vs MV-non-CC, run:
__ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
__ 5 of those SELECTs
__ concurrently.

> For the sake of completeness, and possibly for comparison and further
> discussion, at the end of this message I report my benchmark.
> >> Don't such queries create contention (other transactions cannot
> >> update the data read by the OLAP query until the query is over)?
> >
> > Correct.
> > So what (see above).
> > No one cares about waiting a few seconds.
>
> There are many applications where waiting more than a few hundreds
> milliseconds might be problematic.

The context of my statement there is the OLAP and SELECTs. Outside that context, yes, we want millisec response. Since 2007, microsec resolution and metrics.

> For example, waiting ten seconds
> makes a web app unacceptably slow.

The metric at the bank is 2 secs.
The SG metric is 1 sec.

> > 2. I have stated that if you want to get into this, there is a Server
> > Monitoring report that just happens to expose this problem, and the
> > considerations, and with more granularity than “contention” ... but
> > you have not taken it up.
> > __ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
> >
> > The Read Locks are precisely this, SELECT queries (not OLAP, but that
> > makes no difference) contending with OLTP transactions. Do you not
> > care, that after fixing a SAN Config problem, the Read Locks dropped
> > by 91%, or what that means at the coal face. Or stated in converse,
> > why Read Locks were 1,100% higher when the SAN was mis-configured.
>
> I have no doubt that, given your experience, you are able to achieve
> great improvements on every system you touch.

Thanks.

> My questions are for
> better understanding on my part, not to tell you that you should run
> things differently.

But that is not what I meant, what >>I<< did is not relevant. Given your [then] concern re Read Locks, and such being held for some duration, and such contending with OLTP Transactions, you may be interested in what that actually looks like, in terms of metrics in a production server.

Admittedly that is a top-down view. Daniel has given a bottom-up view, and a further level of detail. I would not go there (further levels), not only are there many levels, and differences due to each implementation flavour, it is bottom-up, not good for understanding. I am trying to get you to think top-down, for understanding.

>> Integrity within a ResultSet
Meaning:
1. (yes) no phantoms; no anomalies
where that would normally be obtained via REPEATABLE_READ, which demands a Transaction, and implies several SELECTS, I provide it in a single SELECT at READ_COMMITTED, no Transaction.

2. I provide Predicate level Integrity in the ResultSet, somewhat beyond [1]. Yes, the methods exploit platform-specific features, but it can be readily translated to any other Commercial RDBMS. This has more to do with performance issues, less to do with SQL.

==========

To proceed with this thread then:
__ generally an understanding of OLTP, that is the LARGE SHARED DATA BANK context, rather than mere speed in a single-user context,
__ hopefully resolve my declaration that PissGriss does not support ACID
__ (ie. any MV-non-CC system cannot support ACID)
____ and therefore cannot provide OLTP
__ the two items that remain from my side:
___1 there are two major occurrences in OLTP that have not been mentioned yet, that must be covered before we can consider this subject complete (MV-non-CC types are totally ignorant about this)
___2 Optimistic Locking. We are dancing around it, without confirming its absolute need in OLTP.

Although you have no further questions re the Batch Transaction, that understanding is not complete, it can be completed only when the above outstanding elements are completed.

Cheers
Derek

Derek Ignatius Asirvadem

unread,
Jun 26, 2021, 12:10:02 AM6/26/21
to
> On Saturday, 26 June 2021 at 06:25:28 UTC+10, Nicola wrote:
> > On 2021-06-24, Daniel Loth wrote:
>
> Hi Daniel,
> great to see someone else diving in! I hope you will last :)

Yes !

> But if Derek can keep all his queries under 12 seconds with an old (as
> in "mature") 2PC system

Sorry if I wasn't clear. Almost all my SELECTs, complex or not, execute sub-second. Almost all even at customer sites. The problem is, on their production system, even though I am hired to reconfigure it, I rarely get the power to change everything that requires correction, there is always some area (eg. SAN config) or some config parm (eg. auditing set up) that I cannot change. Thus I have a few SELECTs on customer systems, that take more than 1 sec, which I remember them well, the worst one being 12 secs.

> 2PC
No.
Not even "1PL"; "2PL"; "C2PL"; "S2PL"; "SS2PL", which are Straw Man understandings of Ordinary Locking.
I am trying to give you the real deal on Ordinary Locking, to remove you from the filth that the MV-non-CC mob declare it to be.

Cheers
Derek

Daniel Loth

unread,
Jun 26, 2021, 4:12:41 AM6/26/21
to
Hi Derek, Nicola,

> But if Derek can keep all his queries under 12 seconds with an old (as
> in "mature") 2PC system

So having gone through much of the content that Derek has proffered over time (much of it on StackOverflow), I believe this is mostly owing to the meticulous normalisation that his databases implement.

Your ordinary database out in industry is often very sloppy.
For example, one of our databases at work has at least one table with at least 70 columns, and those columns are sparsely populated (that is, they are nullable and quite often null).
Suppose each of those 70 columns are, on average, 8 bytes each. Each row is therefore 560 bytes wide.

If we keep it simple and assume that the entire 8KiB of a data page is available for row data then we can fit 14 rows of 560 bytes on each data page. So that's a low data density to begin with.

Next, consider how many ways in which you might wish to query those 70 columns. There are many permutations, and supporting even a modest number of those permutations would involve a menagerie of indexes.

On each insert, all of these indexes must be updated.

On each update, all indexes that contain the modified column values must be updated.
If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).

In the specific case of Postgres, I understand they had a write amplification issue a number of years ago. This meant that all updates involved updating all indexes, even where those indexes didn't strictly require an update.
Uber famously blogged about it as their reason for abandoning Postgres. You can read more about it here: https://eng.uber.com/postgres-to-mysql-migration/

For what it's worth, I accept that this is an implementation detail of Postgres and not something inherent in MVCC. Had they designed their MVCC in another way, they might have avoided the problem. I couldn't say with certainty though just how much work they'd have to do to fix it. Who knows, maybe they have fixed it - Uber's blog post was published in 2016.

Finally, on each delete, all indexes must once again be updated.

Now let's consider what that index count might look like. In practice, it's not that unusual to see an un-normalised SQL Server database table with 10 or so indexes.
If you defer to consultants such as Brent Ozar (of 'Microsoft Certified Master of SQL Server' fame), they'll tell you that you should aim for 5 indexes.

Of course, in our table with 70 columns, it's extraordinarily unlikely that we'll be able to satisfy all of the desirable data access paths with indexes. So a trade-off is made.
We can do one of two things:

1. We can employ more indexes. Each index has its associated cost when performing an insert / update / delete operation, as described above.

2. We can employ fewer indexes. We face lower cost when inserting / updating / deleting data. However it's more likely that a user will run a query with a predicate not easily satisfied by one of the existing indexes, and the DBMS will be forced to resort to a table scan (in physical terms, a clustered index scan).

---

So given the above, I'll contrast it with Derek's typical implementation and the characteristics as I see them.

Firstly, Derek's implementations are highly normalised. The tables in his models often comprise a primary key (natural key) and several attributes. Tables with fewer columns require far fewer indexes to satisfy the queries that you would perform on them.

In fact, he often mentions that his databases contain no null values. This is due to him modelling optional attributes in their own tables. That is, a table that he migrates the key of the parent table into plus that one additional optional attribute. This table, comprising a key and one attribute, is in 6NF.

Suppose you have a 'Person' table with a nullable 'MiddleName' column. And suppose in practice that this table contains a million rows, but only 10% of Person records contain a non-null value for MiddleName.

When modelled as a single table with nullable column, there are 900,000 records with a null value.

Derek would likely model the relationship like this:
Person -> { PersonKey, Attribute1, ..., AttributeN }
PersonMiddleName -> { PersonKey, MiddleName }

Person would contain 1 million rows that do not comprise a 30 to 50 character MiddleName column. In SQL Server, each nullable column has 2 bytes of bookkeeping information when the column is null. These must be processed for every select query executed on the table.

PersonMiddleName would contain 100,000 records. You'd only ever query these records when you specifically require a Person's MiddleName, so they don't impose a cost when querying the Person table.
The PersonMiddleName table would also only require a single index to support those queries, that index being the clustered index.

If we scale this idea up to hundreds or even thousands of nullable columns across a database then they impose a significant cost. In practice, a million records is not a lot. I used 'million' to avoid any ambiguity with 'billion' (which depends on the locale that a discussion participant resides in).

---

Secondly, and perhaps most importantly, MVCC produces a great deal of litter. Old versions of a row are retained in pages. In my example above there are 14 rows per page. This is already very low density because each row requires so much space.

However those 14 rows might additionally include former versions of a logically equivalent row (that is, former versions of some row for a given key).

So if there are 10 versions of a particular row on that data page, and that data page can hold 14 rows, then the page actually contains at most 5 distinct rows (1 row versioned 10 times, and at most 4 other distinct rows).

This means that each page of data processed might provide only minimal utility for that processing cost.

So an MVCC-based system is very likely making less progress with each page that it reads from disk.

An MVCC-based system is also very likely deriving less utility per GiB of RAM.

---

Thirdly is the meticulous transaction control. All transactions live within the database, implemented as procedures.

I can think of at least two benefits:

1. A database administrator has the complete picture on-hand. They can readily optimise the database as required because they have a definitive view of all queries and can determine the indexes required to best satisfy them.

2. Misbehaving client code can't unwittingly start a transaction, acquire a number of exclusive locks, and then retain those locks for a prolonged period of time (thus blocking other users of the system).
And well-behaved client code written with the best intentions can't diligently start a transaction, acquire a number of exclusive locks, and then unwittingly fall prey to CPU scheduler preemption for an arbitrarily long period of time.

On this matter, I posted a new discussion last night (my time) titled 'Stored procedure structure in RDBMS using Lock Manager for transaction isolation'.
I view these discussions through Google Groups. If you do too then the discussion can be viewed here: https://groups.google.com/g/comp.databases.theory/c/BNL-TwgMfPY

---

Fourthly, the databases are modelled and their performance measured with a whole-of-system mindset.

Many software developers jump immediately to 'de-normalisation' (I've used the phrase 'un-normalised' above because most of them don't actually start from a normalised model) because on a micro-level basis they can write a query that is demonstrably faster for one query.

Of course, that's just one query. What developers often overlook is the hundreds or thousands of other queries that must co-exist with this one query.

In my view, this is why MVCC is seen to be working reasonably well by many in the industry.

Many workplaces performing software development activities lack the rigour that is required to implement a database that would be even modestly consistent with Derek's ideal.

In fact, Derek's ideal necessarily involves a degree of Change Control (think older fashioned Change Advisory Boards), and this aspect alone is anathema in modern software development workplaces (who often strive to apply agile methodologies and mindsets, for better or worse).

In this context, which you might ascribe the phrase 'lowest common denominator', MVCC works pretty well. And is seen to work pretty well.

---

Anyway, these things taken together might go some way towards explaining the performance discrepancies mentioned in this discussion and elsewhere.

Cheers,
Daniel

Daniel Loth

unread,
Jun 26, 2021, 4:16:31 AM6/26/21
to
Sorry, just a follow-up concerning this remark:

> If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).

In fact, this is likely not true even in products using MVCC such as Postgres.

Cheers,
Daniel

Nicola

unread,
Jun 26, 2021, 6:56:40 AM6/26/21
to
On 2021-06-26, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
>> Trivially, as MVCC does not use locks.
>
> Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!

I am not sure that by ACID compliance we mean the same thing. You seem
to imply more than what the acronym stands for (or I am taking a narrow
interpretation of it). But I am not excluding that you may be right.
Hopefully, we will be able to clarify this.

>> I have just repeated my experiment. The above query takes ~2.4s with
>> cold data and ~800ms with hot data on my laptop. Ok, that's flash
>> storage and a laptop one decade newer than the one in your document. But
>> it's stock PostgreSQL with default parameters (which are extremely
>> conservative), with no physical tuning applied.
>
> Excellent. The ‘hot’ figure is the one we want (test is logical not physical I/O).

Ok.

> Now if we bring it back to the context of locking vs MV-non-CC, run:
> __ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
> __ 5 of those SELECTs
> __ concurrently.

Time permitting, I'll try.

>> My questions are for
>> better understanding on my part, not to tell you that you should run
>> things differently.
>
> But that is not what I meant, what >>I<< did is not relevant. Given
> your [then] concern re Read Locks, and such being held for some
> duration, and such contending with OLTP Transactions, you may be
> interested in what that actually looks like, in terms of metrics in
> a production server.

With the additional context you have provided, I'm fine.

> To proceed with this thread then:
> __ generally an understanding of OLTP, that is the LARGE SHARED DATA
> BANK context, rather than mere speed in a single-user context,
> __ hopefully resolve my declaration that PissGriss does not support ACID
> __ (ie. any MV-non-CC system cannot support ACID)
> ____ and therefore cannot provide OLTP
> __ the two items that remain from my side:
> ___1 there are two major occurrences in OLTP that have not been
> mentioned yet, that must be covered before we can consider this
> subject complete (MV-non-CC types are totally ignorant about this)
> ___2 Optimistic Locking. We are dancing around it, without confirming
> its absolute need in OLTP.
>
> Although you have no further questions re the Batch Transaction, that
> understanding is not complete, it can be completed only when the above
> outstanding elements are completed.

Good. Hopefully, we'll get to paint the full picture.

Nicola

Nicola

unread,
Jun 26, 2021, 8:34:46 AM6/26/21
to
On 2021-06-26, Daniel Loth <daniel...@gmail.com> wrote:
> Hi Derek, Nicola,
>
>> But if Derek can keep all his queries under 12 seconds with an old (as
>> in "mature") 2PC system
>
> So having gone through much of the content that Derek has proffered
> over time (much of it on StackOverflow), I believe this is mostly
> owing to the meticulous normalisation that his databases implement.

Sure, the whole discussion we are having assumes a proper database
design.

> Your ordinary database out in industry is often very sloppy.
> For example, one of our databases at work has at least one table with
> at least 70 columns, and those columns are sparsely populated (that
> is, they are nullable and quite often null).

That must be fun to work with!

> Suppose each of those 70 columns are, on average, 8 bytes each. Each
> row is therefore 560 bytes wide.
>
> If we keep it simple and assume that the entire 8KiB of a data page is
> available for row data then we can fit 14 rows of 560 bytes on each
> data page. So that's a low data density to begin with.
>
> Next, consider how many ways in which you might wish to query those 70
> columns. There are many permutations, and supporting even a modest
> number of those permutations would involve a menagerie of indexes.
>
> On each insert, all of these indexes must be updated.

A nightmare. But that cannot be saved by any carefully crafted
transactions.

> In the specific case of Postgres, I understand they had a write
> amplification issue a number of years ago. This meant that all updates
> involved updating all indexes, even where those indexes didn't
> strictly require an update.
> Uber famously blogged about it as their reason for abandoning
> Postgres. You can read more about it here:
> https://eng.uber.com/postgres-to-mysql-migration/
>
> For what it's worth, I accept that this is an implementation detail of
> Postgres and not something inherent in MVCC. Had they designed their
> MVCC in another way, they might have avoided the problem. I couldn't
> say with certainty though just how much work they'd have to do to fix
> it. Who knows, maybe they have fixed it - Uber's blog post was
> published in 2016.

I doubt that things have changed. But, as you say, that is not the only
possible implementation of MVCC (and likely not the best).

> Now let's consider what that index count might look like. In practice,
> it's not that unusual to see an un-normalised SQL Server database
> table with 10 or so indexes.
> If you defer to consultants such as Brent Ozar (of 'Microsoft
> Certified Master of SQL Server' fame), they'll tell you that you
> should aim for 5 indexes.
>
> Of course, in our table with 70 columns, it's extraordinarily unlikely
> that we'll be able to satisfy all of the desirable data access paths
> with indexes. So a trade-off is made.
> We can do one of two things:

Or, rather than patching the unpatchable, redesign the database.

> So given the above, I'll contrast it with Derek's typical
> implementation and the characteristics as I see them.
>
> Firstly, Derek's implementations are highly normalised. The tables in
> his models often comprise a primary key (natural key) and several
> attributes. Tables with fewer columns require far fewer indexes to
> satisfy the queries that you would perform on them.

Correct.

> In fact, he often mentions that his databases contain no null values.

I agree with his choice. There are lots of reasons to avoid NULLs, from
the most theoretical to the most practical.

> This is due to him modelling optional attributes in their own tables.
> That is, a table that he migrates the key of the parent table into
> plus that one additional optional attribute. This table, comprising
> a key and one attribute, is in 6NF.

> Suppose you have a 'Person' table with a nullable 'MiddleName' column.
> And suppose in practice that this table contains a million rows, but
> only 10% of Person records contain a non-null value for MiddleName.
>
> When modelled as a single table with nullable column, there are
> 900,000 records with a null value.
>
> Derek would likely model the relationship like this:
> Person -> { PersonKey, Attribute1, ..., AttributeN }
> PersonMiddleName -> { PersonKey, MiddleName }

I can't speak for Derek, but in this specific example I'd leave
MiddleName as an attribute of Person, just making not NULLable
(actually, part of a key). When a person has no middle name, you may
record an empty string. One compelling reason to keep MiddleName with
Person is that it is used to identify a person.

As a general approach, however, what you say is correct: optional
attributes (and multi-valued attributes, such as the phone numbers of
a person) are separate, dependent, facts of Person.

> Person would contain 1 million rows that do not comprise a 30 to 50
> character MiddleName column. In SQL Server, each nullable column has
> 2 bytes of bookkeeping information when the column is null. These must
> be processed for every select query executed on the table.
>
> PersonMiddleName would contain 100,000 records. You'd only ever query
> these records when you specifically require a Person's MiddleName, so
> they don't impose a cost when querying the Person table.
> The PersonMiddleName table would also only require a single index to
> support those queries, that index being the clustered index.
>
> If we scale this idea up to hundreds or even thousands of nullable
> columns across a database then they impose a significant cost.

I am not sure how you would incur into such costs, unless you need to
produce tables with thousands of columns.

> Secondly, and perhaps most importantly, MVCC produces a great deal of
> litter. Old versions of a row are retained in pages. In my example
> above there are 14 rows per page. This is already very low density
> because each row requires so much space.
>
> However those 14 rows might additionally include former versions of
> a logically equivalent row (that is, former versions of some row for
> a given key).
>
> So if there are 10 versions of a particular row on that data page, and
> that data page can hold 14 rows, then the page actually contains at
> most 5 distinct rows (1 row versioned 10 times, and at most 4 other
> distinct rows).
>
> This means that each page of data processed might provide only minimal
> utility for that processing cost.
>
> So an MVCC-based system is very likely making less progress with each
> page that it reads from disk.

That's sensible.

> An MVCC-based system is also very likely deriving less utility per GiB
> of RAM.

In the sense that buffers can potentially be filled with old records?
Yes, that's reasonable.

> Thirdly is the meticulous transaction control. All transactions live
> within the database, implemented as procedures.

Again, I agree with Derek on this. Of course, we are talking about
*database* transactions, not *business* transactions (e.g., buying an
item on an online shop is one business transaction, but it is carried
out through several database transactions).

> I can think of at least two benefits:
>
> 1. A database administrator has the complete picture on-hand. They can
> readily optimise the database as required because they have
> a definitive view of all queries and can determine the indexes
> required to best satisfy them.
>
> 2. Misbehaving client code can't unwittingly start a transaction,
> acquire a number of exclusive locks, and then retain those locks for
> a prolonged period of time (thus blocking other users of the system).
> And well-behaved client code written with the best intentions can't
> diligently start a transaction, acquire a number of exclusive locks,
> and then unwittingly fall prey to CPU scheduler preemption for an
> arbitrarily long period of time.

There are also security benefits: better access control, application of
the least privilege principle, etc.

I say to my students: look at the installation instructions of any
open-source web app. For sure they'll tell you that you should
absolutely give tight permissions to the web directory, that it should
not be owned by root, etc. Then, they go on by making you create
a database and a database user (or, in the best cases, two) to own it.
Then, the web app reads and rides across the database as that user. That
is the moral equivalent of chmod 777 on the file system.

Then, there's the matter of correctness: only the operations on the
database that preserve consistency should be allowed.

> On this matter, I posted a new discussion last night (my time) titled
> 'Stored procedure structure in RDBMS using Lock Manager for
> transaction isolation'.

That's a good summary of our previous discussion. Regarding your
request, the expert is Derek!

> Fourthly, the databases are modelled and their performance measured
> with a whole-of-system mindset.
>
> Many software developers jump immediately to 'de-normalisation' (I've
> used the phrase 'un-normalised' above because most of them don't
> actually start from a normalised model) because on a micro-level basis
> they can write a query that is demonstrably faster for one query.
>
> Of course, that's just one query. What developers often overlook is
> the hundreds or thousands of other queries that must co-exist with
> this one query.
>
> In my view, this is why MVCC is seen to be working reasonably well by
> many in the industry.

Can you elaborate on that? Do you mean that MVCC works well only when
your goal is to micro-optimize, but it is not suitable to obtain
system-wide good performance?

Nicola

Daniel Loth

unread,
Jun 26, 2021, 10:50:10 AM6/26/21
to
Hi Nicola,

> Can you elaborate on that? Do you mean that MVCC works well only when
> your goal is to micro-optimize, but it is not suitable to obtain
> system-wide good performance?

Yes, happy to elaborate by way of anecdote.

Joe Developer discovers that he can examine a query plan in a tool like SQL Server Management Studio. He compares two queries:

select FirstName, MiddleName, FamilyName
from Person
where PersonId = 1;

and

select p.FirstName, pmn.MiddleName, p.FamilyName
from Person p
left join PersonMiddleName pmn
on p.PersonId = pmn.PersonId
where p.PersonId = 1;

And rightfully concludes that the former is faster. The former query produces a plan comprising a single clustered index seek. The latter comprises two clustered index seeks, so twice the cost.
Though note that twice the cost, in this example, really means 1 logical read for the former and 2 logical reads for the latter. So twice as expensive, but really that extra expense is marginal.

What this developer doesn't consider is that we might only actually need to select the middle name as part of two or three queries, and that these queries that do require the middle name actually comprise a very small proportion of the workload that the DBMS must handle.
The developer, an individual working on a task, is focused on their own work and very easily takes a myopic view with little thought for next month, let alone next year or the next five years.

This behaviour is what leads to the aforementioned 70 column table, and the perception that de-normalization leads to higher performance. One person does it, someone else follows their lead, and then it becomes the norm.

Developers make other poor choices too. All of the time, really. Often due to it being expedient to do so due to perceived or real pressure to deliver.

For example, they might write code that updates a substantial number of records within a single statement (which has been touched on in recent discussions here).
They test it locally and it's fine, because they're working with a small set of records locally.
But in production environments with a substantial volume of data this creates an untenable situation in the context of a database that uses a Lock Manager, because the writers-block-readers situation arises.

A series of often short-term expedient decisions are taken and death spiralling ensues. Developers inevitably reach a point where they struggle with slow transactions, excessive blocking, deadlocks.
MVCC is seen as a get-out-of-jail-free card when improper transaction management leads to excessive blocking in a database with a Lock Manager.

This creates a perverse situation where MVCC is seen to be better because it avoids the untenable: A database that might effectively be single-user-only until the aforementioned writers-block-readers situation subsides and normal service resumes.
You could say that MVCC provides a safety net. A poorly behaved query won't necessarily bring the system to its knees. At least not for the same reason as a system relying on a Lock Manager, anyway.

In the Microsoft world, the recommendation nowadays from a number of highly regarded thought-leaders (such as Brent Ozar) is to enable 'snapshot isolation' and 'read committed snapshot isolation' by default (i.e., SQL Server's MVCC).
Pragmatic DBAs accept that developers are going to make a mess of it, and they recognise that in many companies developers are going to be making database design decisions because those companies reject ideas such as Change Advisory Boards having to approve things. The idea being that CAB is a handbrake on development velocity. And I recognise and accept that a CAB does slow things down in practice (though for the purpose of due diligence).

In this context I would agree that MVCC can work very well in spite of some of the issues previously discussed. But not because of technological superiority when compared with a RDBMS using a Lock Manager.

But to circle back to my remarks about utility, I don't think MVCC can physically outperform a system that uses a Lock Manager.

Reduced utility from resources due to more data pages being required to store the same content (i.e., remnant versions of rows take space).
This means more pages to cache in RAM and thus more contention for this finite RAM.
In turn, more cache churn. Page Life Expectancy is lower, so it's more likely that a page won't be cached when it needs to be accessed. So then there's additional IO pressure.

With those physical realities in mind, I'd expect a database using locking to perform better when implemented properly.

However, I imagine that if you have a database hosted on a system and that database fits entirely in RAM (and has plenty of headroom) then perhaps they perform comparably.

Cheers,
Daniel

Derek Ignatius Asirvadem

unread,
Jun 27, 2021, 9:26:56 PM6/27/21
to
Nicola

> On Saturday, 26 June 2021 at 20:56:40 UTC+10, Nicola wrote:
> > On 2021-06-26, Derek Ignatius Asirvadem wrote:
> >
> >> Trivially, as MVCC does not use locks.
> >
> > Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!
>
> I am not sure that by ACID compliance we mean the same thing. You seem
> to imply more than what the acronym stands for (or I am taking a narrow
> interpretation of it). But I am not excluding that you may be right.
> Hopefully, we will be able to clarify this.

Being attached to objective truth, let’s start with definitions. Obviously I do not have private definitions. Since IBM invented it, and set the standard for Transaction Processing, we use theirs. But before I go to yet another item, that will be viewed as a fragment, in isolation from the reality of its existence, let’s name that problem.

For understanding, for the full context, think about the academics,
(a) who remain clueless about ACID and Ordinary Locking,
(b) who erect Straw Men for “ACID”; “2PL” and its variants, in order to maintain [a],
(c) who are addicted to the hysterical fiction of MV-non-CC,
(d) who seek to impose that insanity (count ‘em, three levels) on the world,
(e) who sing the Stonebraker mantra while getting 10,000 developers across the planet to write /one/ codeline, and
(f) who maintain denial of the mountain of evidence that there are problems consequent to the insanity. (Just reading that makes me ROTFLMAO.)

Now those freaks write hundreds of papers to maintain their fantasy world, otherwise they would have nothing to read. So to them, yes, in their fantasy world, and their Straw Man concept of the real world, there are all sorts of “problems” in Ordinary Locking, and ACID is reduced to the meaningless filth that they promote as “ACID”. Even the notion of Transaction is fragmented, very shallow. And of course, the overarching problem that there is no integrity in the system because they are deal ing with fragments; not Atoms.

In my response to Daniel, I stated:
> On Sunday, 27 June 2021 at 19:11:07 UTC+10, Derek Ignatius Asirvadem wrote:
>
> The thing is this. And both Nicola and you have this problem. Not because you are stupid, you are not, but due to the programming that passes for “education” that has been imposed on you. You have been trained to (a) think in terms of fragments, not integration, not atoms. And that means (b) bottom-up, instead of top-down. So you have noticed and copied things correctly, but the overall need, and the integration within the Atom, is not complete.

Now you have left that, at least tentatively, and you are crossing the chasm between academia and reality in this science. Excellent. But the consistent problem, the obstacle, is that you are viewing the real world (here Transactions and ACID), from your academic perspective, as fragments, and bottom-up, which is dead wrong, as detailed above.

In science, we simply cannot have different definitions. If we did, we would be in two tiled cells, across the corridor from each other, screaming our “truth” to the other. No, science is objective. We have the same objective definitions, or you are not a scientist.

We cannot deal with fragments, while denying the atoms that the fragments are located in.

So you need to drop the academic, now-known-to-be-hysterical notions of Transactions; of ACID, and pick up the unchanged scientific ones.

Even Transactions, and ACID (which has to do with Transactions), are fragments, out of context. So first let’s establish that outer context, the entirety of which the Stonebraker cult deny ... and then they try to resurrect fragments thereof at COMMIT time.

--------------------------------------------
-- Transaction Processing --
--------------------------------------------

You are not in Disneyland, where you can forget about reality, sing a hymn to Stonebraker, and erect a system that magically produces “transactions” that “have ACID properties” that “never conflict”. You are not in an asylum, where you can deny the reality of a single version of the truth in a shared database, and maintain a your private version that you do God-knows-what with.

No, this is the real world, conflicts are normal, and we have ways to resolve them efficiently. The shared database is an ONLINE AND EVER-CHANGING single version of the truth. The moment you take your snapshot or “version”, it is, by definition, out-of-date; stale; obsolete. Stick that in your photo album as a keepsake from your trip to Disneyland, grab a coffee, and get ready to deal with the real world.

__ https://www.ibm.com/docs/en/cics-ts/5.4?topic=overview-transaction-processing

Now in that context, we have:

-----------------------------------------------------
-- Transaction • ACID Properties --
-----------------------------------------------------

__ https://www.ibm.com/docs/en/cics-ts/5.4?topic=processing-acid-properties-transactions

Since the academics have perverted the meanings, in order to provide tiny fragments of it, and have that accepted as “ACID”, we need to take that down, to maintain the original and full meaning:

----------------------------------------------------------
-- Transaction • ACID • Consistency --
----------------------------------------------------------

//Data is in a consistent state when a transaction starts and when it ends.//
What is described above is obviously Transaction Consistency. Separately, the database has Database Consistency, which is the sum total of all constraint declarations. These cannot be violated, meaning that:
__ Data is in a consistent state when a transaction starts; at each step; and thus when it ends.
Eg. Customer_Has_[1-to-n]Addresses is not enforceable ate the database level, it is enforced at the Transaction level.
Eg. That the Addresses belong to a valid Customer, is enforced at the database level.

The pig sucker brigade (Date; Darwen; Fagin; Stonebraker; Ellison; all their followers; etc) do not have a database, they have only pre-Relational Record Filing Systems, the database level constraints are not understood, and what constraints are implemented in their RFS is meagre. They pervert that definition, and demand that database constraints can be violated during the transaction, and only enforced when the transaction ends. Eg. The /Relational Model/ prohibits circular references, but they need it in their RFS, thus they need the “deferred non-constraint checking” that are foreign to Relational databases.

In a Relational database, data Consistency is maintained throughout (when a Transaction begins; during its execution; when it ends)

-----------------------------------------------------
-- Transaction • ACID • Isolation --
-----------------------------------------------------

//The intermediate state of a transaction is invisible to [or isolated from] other transactions.
As a result, transactions that run concurrently appear to be serialized.//

Stonebraker and his slaves pervert that at two levels.
1. Having denied:
__ a. that the database is a shared resource, with other concurrent users, and
__ b. that the database maintains a single version of any truth (Fact),
__ c. that can be updated by anyone,
it takes Isolation as the entirety of its existential reality, isolating the user in a tiled cell, where the madding crowd of reality does not intrude.

But note, this fosters a single-user, single-thread, mindset. A pathological anti-social mindset in a TP environment.

2. Of course the bubble of denial of reality, is punctured when the thus isolated user has to COMMIT his changes, to the [oh no, it exists] shared database, that is [oh no, they exist] shared by concurrent users. So it hysterically mixes up the cause and the effect, and try to obtain the effect without the cause:

__2.1 It takes [ Transaction/ACID/Isolation ] and perverts it into [ TransactionIsolation ].
Instead of understanding that THE UNCOMMITTED CHANGES INSIDE a Transaction are isolated from other users, they define a new notion: TransactionIsolation, as a thing, and they give the guy in the tiled cell who is very very attached to his stale “version” of the database, even more power to reinforce his insanity, there you now have TransactionIsolation, you can hold it and suck it for as long as you like, just like the “versions”. Just do not mention the resolution problem (to be clear even that is not Concurrency Control) at COMMIT time.

Yet more erection of fantasy, of users pretending to be isolated from each other, totally anti-social, while operating on a shared database, that has social requirements. Reinforcement of the schizophrenic position, at war with reality. This entire page defines the perversion, it makes teh fantasy “real”.

No, that is taking a fragment of meaning, about one component of ACID, and creating a new fantasy of TI, inside the old fantasy of MV, for the explicit purpose of maintaining the fantasy of MV.

No, in the real world, we have no fantasies, we are attached to the shared database and the social behaviour requirements that go with such, we have [ Transaction/ACID/Isolation ], therefore we do not need the hysterical notion of [ TransactionIsolation ], please erase it from your mind.

__2.2 It takes [ appear to be serialized ] and even [ SERIALIZABLE ], and perverts it into [ SERIALIZED ]. Hell, it is not the actor in a costume, that appears as the character, that makes the character see real, it is the costume that is real. You can’t make this stuff up, you have to get a PhD in Computer Science at Warwick University or Berkeley, to be able to come up with such filth. And then all the academics genuflect. And then 10,000 young minds that have been perverted run along and write code. Ever-changing code. To make fantasies appear real.

They take it as concrete.

So in that fantasy within a fantasy [ TransactionIsolation ], which contradicts both the real world and the definition, they now have a concrete railroad, a forced single-thread mentality, a forced [ SERIALIZED ], which of course is right at home with the single-user mentality of the MyPrivateVersion, sitting next to MyLittlePony with the rainbow mane and everything. Hint: they need it because it is the only way that such idiots can resolve the millions of version, spread across the entire set of data storage (Daniel has detailed that a little), at COMMIT time.

They can only understand the schizophrenic single-user context. There are multiple users ? Oh, ok, give them each a single-user private version. Need ACID ? Oh, ok, give them a fragment, [your Transaction is Isolated].

They cannot understand the real world, the multi-user single-version context. Every shared resource has a lock (hint: toilets in a restaurant), the purpose of which is to (a) preserve the integrity of the shared resource, and (b) to provide shared access to it, the social requirements. Too dangerous. Too hard. Let’s go to DisneyWorld and create a fantasy about what we can do.

This idiotic concretised single-user railroad of course, creates a new set of problems, that Ordinary Locking systems do not have. In the PusGres example, that other MV-non-CC systems do not have.
__ https://www.postgresql.org/docs/current/transaction-iso.html
//serialization anomaly
The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.//

Oh great, they not only have fantasies, they have permutations of fantasies, much like an orgy at a rave party. And now, they are going to impose this “problem” on the rest of the planet.

Back in the real world, we do not deny reality. It is multi-user, single version, shared. Transactions can be interspersed, but they APPEAR TO BE SERIALIZED. That is, we allow the shared components to be shared, even within a Transaction, and when something is changed, which deems it exclusive, we do not allow other users to “see” it until we have COMMITTED.

We do not have the “problem” of “serialization anomaly” because we do not have the two levels of fantasy that are required to host the “problem”. And we can’t be asked to answer because we do not SERIALZE, we only APPEAR TO BE SERIALIZED, because we can code for a multi-user OLTP context.

In sum.
1. This post is separate to, or further to, my posts that point out that MV-non-CC will never work (as evidenced) because it breaks the first principle of a shared database.
2. and separate to the posts that point out that the entire notion of MV-non-CC, is schizophrenic (denial of reality)
3. I don’t know if I have succeeded, but in this post, I have tried to expose the morass of false definitions that academia has established, in order to foster their insanity, and insane fantasy systems, that implement any form of MV-non-CC. Changing existing definitions is pure fraud.

The demand for you, is to first accept the original, un-perverted definitions; second, to divest yourself of the fantasy notions, the Straw Man perecptions of reality, third to learn what the OLTP context really is, and fourth, to then evaluate the methods we use. Doing that fourth step without the first three steps, will result in (a) you never learn about the science, about reality, and (b) our discourse will never end, nothing can be resolved.

> On Saturday, 26 June 2021 at 22:34:46 UTC+10, Nicola wrote:
> > On 2021-06-26, Daniel Loth wrote:
> >
> > Thirdly is the meticulous transaction control. All transactions live
> > within the database, implemented as procedures.
>
> Again, I agree with Derek on this. Of course, we are talking about
> *database* transactions, not *business* transactions (e.g., buying an
> item on an online shop is one business transaction, but it is carried
> out through several database transactions).

God help me.

See, there, you have broken one of the fundamental principles of OLTP/Transaction/ACID and you don’t even know that you have done so.

Let’s take the notion of “business rules”. I think you understand that academics, in their ignorance and stupidity (due to eating pig poop, and maintaining a contrived notion of the “industry”), think that database rules and constraints are quite different to business rules. Fagin has papers on this stuff. DKNF is “defined”, even in “mathematical” terms, and declared to be impossible. I think, in the last ten years, you have realised that that is false, that the reality is, guys like me implement genuine Relational Databases ala Codd, and more precisely the fullness of the Codd definitions. You have seen that we implement ALL business rules as constraints, there is no distinction between “database rules” and “business rules”.

Good. Now try this. Likewise, there is no difference between “database transactions” and “business transactions”.

Question. If the “database transaction” is an OLTP/Transaction/ACID, which is Atomic, how in God’s name can it be a part of a “business transactions” that is atomic ??? Hopefully you have not denied that Atomic means indivisible. If the “business transaction” is an OLTP/Transaction/ACID, which is Atomic, how in God’s name can it contain many “database transactions” as components, that are atomic ??? The Atomicity in one destroys the Atomicity in the other.

That is the kind of insanity that academics have, due to being programmed by the MV-non-CC mindset; the Stonebraker mantra; the hair-splitting pharisaic “arguments” of the Date; Darwen; Fagin gulag. Due to the hundreds of academic papers that promote all that filth.

No. Within the Four Laws of Thought, which is Logic; Sanity, which is the foundation of Science, we do not have such insane contradictions. Reality has no contradictions.

No. We have only [ OLTP/Transaction/ACID ] Transactions. There is one and only one type. No other type is defined, and if you dare to define another type [in the established style of your dishonest and schizophrenic colleagues], I will tell you now, before you do so, it is rejected. We have lived without it for FORTY YEARS. We simply do not have the problem: we do not split the Atom, and we do not have the problem of resolving fragments [sub-atomic particles] that then need to be resolved. So, using the original definitions at the top of this post, the fullness and spirit thereof (not “narrow”), and not extended in any way:

Eg. buying an item on an online shop is one [user/business/database] transaction.
Eg. buying another item on an online shop is one [user/business/database] transaction.
Eg. paying for the items purchased in an online shop is one [user/business/database] transaction.
Eg. returning an item purchased in an online shop is one [user/business/database] transaction.
Eg. refunding a returned item purchased in an online shop is one [user/business/database] transaction.

And further, if the “business transaction” is not a OLTP/Transaction/ACID Transactions in a stored proc in the database, it is not a “database transaction”. Which it had better b, in order to be a Transaction at all.

----------

If you understand this, you will understand that there is only one level of depth in Transactions, either one Transaction or all Transactions in a system. The notion of “nested transactions” is hysterically absurd.

Cheers
Derek

Derek Ignatius Asirvadem

unread,
Aug 21, 2021, 11:58:39 PM8/21/21
to
Nicola

In the /Stored Proc for OLTP Transactions/ thread, one major issue we dealt with was (a) the exposure and use of false and redefined terms by academics, (b) the assertion of standard terms, noting the chronology. It occurs to me this thread suffers the same, the extent of which I did not appreciate at the time. I suggest you read this thread again, keeping in mind that when I use standard terms, I mean the standard meanings (not any academic re-definitions).

Cheers
Derek
0 new messages