Transactionality - jBASE performance and file resizes on large transactions

13 views

Skip to first unread message

Pawel (privately)

unread,

Jan 5, 2009, 9:09:59 AM1/5/09

to jBASE

Hi guys,

Does anyone know how jBASE transactionality works in details? I can
imagine some algorithms and it seems to me that Jim said that jEDI
creates changes buffer in memory.

Questions are following:
a) How does is happen that jBASE is transactional? Assume multiple
writes are required to commit transaction. Assume that filesystem is not
transactional or WRITEs are performed on directories (not J4 files).

b) Does file sizing influences on a transaction performance - mean WRITE
/ DELETE operations? I think it does not so much (my understanding: file
sizing problem occurs when transaction is commited and records flushed
to disk; but does not exists as long as transaction is open).
Imagine that some brilliant guy created single transaction for
300000-900000 (900k) changes. It lasted almost 7h, but I wonder how
jBASE behave.

c) Does number of changes inside transaction influence on READ / WRITE /
DELETE time? (here bottleneck can be found I think). jBASE needs to scan
somehow changes buffer, right? Is it efficient mechanism?

d) What can you do if you do not have influence on how code is written
and how transaction is managed?

I know that answer for first 2 questions is "check by yourself"... :) I
will eventually try to check by myself and share results.

Kind regards
Pawel

----------------------------------------------------
Jesteś fanem Kasi Zielińskiej
i Barw Szcześcia?
Zagłosuj w plebiscycie Telekamery 2009!
http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fkzielinska.html&sid=604

Jim Idle

unread,

Jan 8, 2009, 12:26:55 PM1/8/09

to jB...@googlegroups.com

Pawel (privately) wrote:
> Hi guys,
>
> Does anyone know how jBASE transactionality works in details? I can
> imagine some algorithms and it seems to me that Jim said that jEDI
> creates changes buffer in memory.
>
> Questions are following:
> a) How does is happen that jBASE is transactional? Assume multiple
> writes are required to commit transaction. Assume that filesystem is not
> transactional or WRITEs are performed on directories (not J4 files).
>

If a program request a transaction start, then assuming that the file in
question is flagged as taking part in transactions, then the writes to
that file are cached in memory. Subsequent writes to the same item will
just replace the one in the cache. They will stay in memory until the
transaction is aborted (they are discarded) or committed (when they are
all written to the database as fast as possible). This gives isolation
of read-uncommitted for the process that owns the writes (as in the
process that wrote the record can read it back and will see the
changes), and read-committed for the user population (as in other
processes reading a record that was written as part of the transaction
will not see any updates until they are committed to the database. As
with all databases, there will be some period of time when the
transaction is not yet complete, but the records on a file have been
committed to the database. If your database uses read-uncommitted
isolation (for instance DB2 is, Cache' is) then that can be a long time.
Hence all applications should use locking, even in transactions - the
locks taken within a transaction are only released once all the data is
on the disks and the transaction is deemed to be sound.

> b) Does file sizing influences on a transaction performance - mean WRITE
> / DELETE operations?

No. The size of the file on disk has nothing to do with this. In fact,
if multiple writes are made to the same items in files then the
transactions will be faster (however this means your application is
weak, as it does if it re-reads something it has already read/updated).

> I think it does not so much (my understanding: file
> sizing problem occurs when transaction is commited and records flushed
> to disk; but does not exists as long as transaction is open).
> Imagine that some brilliant guy

I think you meant 'brilliant' ;-)

> created single transaction for
> 300000-900000 (900k) changes. It lasted almost 7h, but I wonder how
> jBASE behave.
>

Basically, whoever did that has no understanding of transaction
whatsoever and should be sent back to school. This usually happens with
batch jobs where someone puts TRANS-START at the beginning of the
program and runs the whole thing. The time taken is because it must
trace through the hundreds of thousands of memory records and is
exacerbated because the updates will be in all sorts of files and all
over the disk. Providing that you have enough memory though, it will work.

> c) Does number of changes inside transaction influence on READ / WRITE /
> DELETE time? (here bottleneck can be found I think). jBASE needs to scan
> somehow changes buffer, right? Is it efficient mechanism?
>

Internally it is using a hashed structure. However this structure is not
optimized for hundreds of thousands of writes/deletes because of course
you are not supposed to create transactions like that. Suppose you need
to abort the transaction? You will ahve spent hours on the batch, one
thing goes wrong and you abort 700,000 updates. Basically it is just
plain stupid and you just need to fix the program. If it is not your
program, then go back to your supplier and tell 'em I sent ya.

> d) What can you do if you do not have influence on how code is written
> and how transaction is managed?
>

You really should have your supplier fix the issue. It could be though
that the entire file structure isn't right for transactions. It could
also be that what they are trying to achieve is an all or nothing
approach on a batch job and this isn't the way to do that. What you
should do in this case (assuming an off-line batch job) is make a copy
of all the files involved, use jchmod to turn off transactions for the
files involved, then run the batch job. There will be no updates/deletes
in the transaction and it will work as if there were no boundaries. If
the batch completes, then turn transactions back on (if they should be
used in the normal event). If it fails, abandon the partially updated
files, copy in your originals, fix whatever the issue is, re-run the batch.

In fact, in my new file system (to be released one day), there are
specific mechanisms to work entirely in memory for batch jobs, then
flush the updates at the end. This is aimed at large installations like
T24, where batch runs are important and almost always complete. IN the
event of a system crash and so on, then an installation does not mind
entering their recovery procedure. In this case it is just copying in
the originals and restarting, which is very simple and therefore less
error prone. As batch jobs run this way will be possibly 100's (yes
hundreds) of times faster, then you can afford lots of failures.

In the end though, you need the application supplier to fix their stupid
code and design the application properly.

Jim

Reply all

Reply to author

Forward

0 new messages