Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Can Interbase handle 300.000 *big* records?

7 views
Skip to first unread message

Call Center

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
What's your opinion about this:

Select Count(*) From BIG_TABLE

This basic query takes 5:40 (!!!) to perform (yes, nearly 6 minutes!)
via WISQL when there are about 300.000 records in the table BIG_TABLE.

How can this be?

Admittedly, it *is* a very large table because it consists of about
hundred columns (all VARCHAR(100)), each of which can contain up to 100
characters, so a single row can consume quite an amount of space.
This seems to be the cause of this behaviour.

If I try the "Select Count(*)" on *another* table (with some 300.000
records, too) I get the correct result after 4 seconds. This table is
much smaller because there are merely 5 Integer-fields in it.

Ok, seems this is the problem, am I right?

But what can I do? According to the Interbase specs Interbase should as
well be able to handle such amounts of data. Should I split the table in
two with a 1:1-relation? Setting the page_size to 4096 did not solve the
problem at all...


To my opinion, a DBMS should be able to handle such huge tables (is it
really that huge for Interbase?) in various ways: I'm thinking of a
little bit intelligent algorithm that uses for instance the primary key
to get the Count(*). The index is noticably smaller than the data itself
and it would take considerably less time to count the record this way.

Next thing which is puzzling me maybe even more:

Select Max(ID) From BIG_TABLE

...takes over 3 minutes when ID is the primary key!!! What the ...? Is it
naive to think that all Interbase has to do is to find the highest value
in an *index*??? What's the problem? Why does it take so long?

Any comments?
Alex

Craig Stuntz

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to

Call Center wrote:
>
> Any comments?

Run this in IB_WISQL (free from http://www.ibobjects.com) and look at
the query optimization PLAN. Post both PLANs here if you still need
help.

-Craig

--
Craig Stuntz Vertex Systems Corporation
Senior Developer http://www.vertexsoftware.com

Call Center

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to
Everything gets stranger:

Select Max(NOT_INDEXED_FIELD) From BIG_TABLE

yields a performance of 90 seconds, that is exactly the same timespan as
the

Select Max(ID) From BIG_TABLE

statement which searches the maximum of the primary key!!

So searching the maximum of an indexed (even primary key) and not indexed
field takes the same time? Er... I don't see...

Here is the query plan for the "Max(ID)"-statement:
PLAN (BIG_TABLE NATURAL)

Does this tell you anything? At the first glance I thought he did *not*
use the primary key as an index but executing the same query on a
different "Inter-Database" gives me PLAN (OTHER_TABLE NATURAL) as well,
with much better performance: 6 seconds on 450.000 records, and - big
surprise - 6 seconds too on the following query:

Select Max(NOT_INDEXED_FIELD) From OTHER_TABLE

6 Seconds on a not indexed field at 450.000 records! Wow!!

So why why why is my BIG_TABLE-database that slow?

Alex

Todder

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to

Call Center wrote:

If you add a where clause to your select it changes the plan and uses the
index. For example:

Select Max(ID) From BIG_TABLE Where ID < 9999999999999

IF ID is the primary key, the query will use the index and your query will be
much faster.

Todd


Craig Stuntz

unread,
Jun 8, 2000, 3:00:00 AM6/8/00
to

Call Center wrote:
>
> Everything gets stranger:
>
> Select Max(NOT_INDEXED_FIELD) From BIG_TABLE
>
> yields a performance of 90 seconds, that is exactly the same timespan as
> the
>
> Select Max(ID) From BIG_TABLE
>
> statement which searches the maximum of the primary key!!

Not surprising, considering what is below.



> So searching the maximum of an indexed (even primary key) and not indexed
> field takes the same time? Er... I don't see...

Yup. You need a DESCENDING index to select the MAX(). A primary key
is ASCENDING, and will work for MIN(). See what happens when you try
SELECT MIN(ID) FROM BIG_TABLE.

(Why? Well, MAX() works by performing a sort and getting the first
record. If it used an ASCENDING index, it would need to get the last
record, which is the same thing as going through the entire table
anyway, only with the addition of the sort.)



> Here is the query plan for the "Max(ID)"-statement:
> PLAN (BIG_TABLE NATURAL)
>
> Does this tell you anything? At the first glance I thought he did *not*
> use the primary key as an index but executing the same query

Your first impression is correct. PLAN(BIG_TABLE NATURAL) is a fancy
way of saying record-by-record scan.

> 6 Seconds on a not indexed field at 450.000 records! Wow!!
>
> So why why why is my BIG_TABLE-database that slow?

Here's my guess as to what is happening:

None of your queries are using indices, because you probably don't have
one defined for the table. That's the first problem, and it's probably
enough to get you where you're going. But the rest of the issue is
worth discussing, as it hits on some very fundamental facts about
InterBase.

InterBase's multi generational architecture works like a version
control system. This is a huge advantage of IB's architecture for
reasons which are off the subject of this post.

During a transaction -- and all queries run in transactions -- you are
essentially working on your own copy of the data. When the transaction
is COMMITted, your changes are reconciled with the "real" data. The
first time each record is read, it's stored away for later use within
the transaction. In the case of BIG_TABLE, that requires a lot of
memory, so InterBase caches the data on disk. This is, of course, much
slower than caching the data in RAM, but either you don't have enough
RAM installed to hold this much data or IB is not configured to use it.

It's worth noting that if your long string fields were Blobs instead of
VARCHARs, this would not happen. With a Blob, only the pointer to the
file is read in on a record scan, not the data itself, which is only
retrieved when the client asks for it.

HTH,

Call Center

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
Thank you for your answer, your explanation would shed some light on the
seemingly weird behaviour, and I understand the versioning mechanism. But
supposed, you are right, some questions really strike me.

> Here's my guess as to what is happening:
> None of your queries are using indices,
> because you probably don't have
> one defined for the table.

Zero points ;-)
There is a unique index on the field ID in BIG_TABLE (BTW, it's the only
index defined for this table).

Now here for my burning questions:

1. The versioning mechanism: even when I launch a "Select"-query which
does not change any data, InterBase generates a unique version of the
whole table? I understand the mechanism when it comes to DML statements,
but with selects?

2. To find the maximum of an _indexed_field_, InterBase looks through the
_table_? AFAIK (thinking of the times when I developed dBase apps)
indexes are stored seperately and are some kind of pointers to the actual
records. Is InterBase different? In my opinion, to search the maximum
within an index, Interbase doesn't have to store _any_ data, just make a
snapshot (versioning) of the index and rummage this.

3. Furthermore, I don't see the problem with ASC/DESC exactly, but that's
maybe because I know not very much about the very interna of InterBase:
When using a ASC index on the field "ID", InterBase is not able to jump
to the last value in the "index table" to just find the maximum there? It
just has to go to the first element and browses this?
Sorry - I can't believe this, because this would also mean (please
correct me when I'm wrong) that if you would want to retrieve the record
with the ID 555777, InterBase jumps to ID 1 and browses ascendingly until
it finds the ID 555777. No - I dare say that the main thing about indexes
is their qualification for binary search, so Interbase just _has_ to be
aware of the index's root and its end, right?

4.


> (Why? Well, MAX() works by performing a sort
> and getting the first record. If it used an
> ASCENDING index, it would need to get the last
> record, which is the same thing as going through
> the entire table anyway, only with the addition
> of the sort.)

Pardon? MAX sorts my ID's when there is already a perfectly sorted
version of it (index!)? Really? Why?

I know, I am quite inconvenient, but this is really a serious problem for
me and I want to definitely know where exactly the problem is. I'm not
content with assumptions. But I do appreciate your help.

Regards,
Alex

Nick Ryan

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
> 4.
> > (Why? Well, MAX() works by performing a sort
> > and getting the first record. If it used an
> > ASCENDING index, it would need to get the last
> > record, which is the same thing as going through
> > the entire table anyway, only with the addition
> > of the sort.)
>
> Pardon? MAX sorts my ID's when there is already a perfectly sorted
> version of it (index!)? Really? Why?
>
> I know, I am quite inconvenient, but this is really a serious problem for
> me and I want to definitely know where exactly the problem is. I'm not
> content with assumptions. But I do appreciate your help.

Just a quick idea - create another index on the same field as the primary
key and sort it descending. Your updates will take a little longer but your
select max() will be quicker. In case you don't quite understand why this
the case, Interbase doesn't store the record count or any other "calculated"
field in the database. In a multi-user database storing these values is not
a good idea as they can get out of sync very quickly with the reality of the
database. Therefore because Interbase only stores the starting point of an
index, and not the ending point, it has to navigate the index to get to the
ending value (there are various optimizations that make this process faily
neglible for record selections but these don't apply in the same way when
getting the last index record).

Hope this helps a little.

Nick

Call Center

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
Hm... what do you think of this:

If I was InterBase and you would tell me to "Select Max(ID) From
BIG_TABLE", what would I do?
"Pfuh... this guy sucks! What does he want from me right now? Why can't
he just leave me alone? Nah, let's see: he wants the highest value of the
field ID. Ok... JESUS CHRIST! The table is really FAT! That's gonna take
me hours! Wait a minute... ah, yes, there's an unique index on the field
ID. Ok, I have an idea: I don't browse through the table, instead I just
browse through the index! Yep! That's it! Gee, I'm so clever! Just in
case something is going to happen in between I make a copy of this index
table and browse through this snapshot. Let's have a look at it: yes,
there are about 300.000 records and because I just look at the index, I
just have to store 300.000 integers and search the maximum. I ignore the
table completely."

[3 seconds later]

"Voila! Here it is: the highest value is 340.534! Now return it to the
user..."


Is it too naive to think a DBMS can handle this "Select" like this? If
not, then why not?

Alex

Call Center

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
I would generally agree with you. That would certainly help, I think so.

But to me it seems a little bit absurd having to set an index ASC _and_
DESC on one field.

Furthermore I'd like to say that my problem is definitely _not_ to find
the Max(ID), my problem is actually working with this BIG_TABLE, joining
it to other tables and so on. And when I first waited hours for one two-
table-join-Update to complete, I asked myself: what the ... is going
here? Why does it take so long?

So I came up with the purest "performance meters" I could think of:
Select Count(*) From BIG_TABLE
and
Select Max(ID) From BIG_TABLE

The idea was if I could understand just why performance is so poor with
_these_ two Selects I have a hint what to do to make my "real world
selects" work at considerable speed.

Until now I don't see exactly what I can do to make my selects run (maybe
fumbling around with PLAN-query-options would be an idea).

Alex

David Becker

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to
I have found fumbling around with query plans to be *very* beneficial, and
have often achieved performance gains of 300%, often times much more.

Unfortunately, PLANs are poorly documented, and even when you do understand
them, there are some instances where IB just refuses to accept them, telling
you that a particular index can't be used. And, lest anyone think I was
making a mistake in my PLAN, I've managed to juggle the query around such
that IB generates the *exact* same plan I was trying to use. Even then, if
I literally CUT AND PASTE the plan that IB itself generated, it still won't
accept it from me (of course, there's no point then, but it SHOULD accept
it).

Though I do agree with you that the MAX() handling seems absurd, and a PLAN
won't help you there, though (as indicated before) a descending index
would... Rather than focus on these arbitrary examples, I think you would
be better off examining the PLAN for the actual query.

--
David Becker
Programmer Analyst
Lease Marketing, Ltd.


"Call Center" <alexande...@clientel.at> wrote in message
news:MPG.13ab404e1...@newsgroups.borland.com...

Craig Stuntz

unread,
Jun 9, 2000, 3:00:00 AM6/9/00
to

DISCLAIMER: We're getting to the limits of what I know about IB
internals here.
That said, if you add a DESCENDING index to your table and it works for
MAX, I think it pretty much backs up what I'm about to write...

Call Center wrote:
>
> > Here's my guess as to what is happening:
> > None of your queries are using indices,
> > because you probably don't have
> > one defined for the table.
>
> Zero points ;-)
> There is a unique index on the field ID in BIG_TABLE (BTW, it's the only
> index defined for this table).

You need a DESCENDING index. An ASCENDING index (like a primary key)
will not help you.

Perhaps I was not clear enough in my (quoted) reply. MAX needs a
DESCENDING index. Using an ASCENDING index to find the MAX is worse
than no index at all.

I will try to explain this more clearly.

Here is one way the database might calculate the MAX record when there
is no index:

1. Read a record. Store the value.
2. Read the next record.
3. Compare the value with the stored value. Store this new value if
it's bigger.
4. Go back to step 2 and repeat until the last record in the selection
is read.

(I think it may do a sort instead, since this may turn out to be
quicker, but you'll have to ask Jim Starkey if you want to know for
sure.)

Here is how the database calculates the MAX record when there is a
DESCENDING index available.

1. Go to the first entry in the index.
2. Return this value.

To use an ASCENDING index to find the MAX, here is what would have to
happen:

1. Go to the first entry in the index.
2. Go to the next entry.
3. Repeat step 2 until the end of the index.
4. Look up the value of this record and return it.

To the best of my understanding, traversing an index is a lot like
traversing a linked list. (But see my reply to your other question on
this subject below.) You can't just go to the middle, or to the end of
the index. You have to step through it one entry at a time. There is
slightly more work involved in stepping through every entry in an index
than there is in stepping through every record in a table.

Try creating a DESCENDING index on your table and I bet your results
will be much, much faster for MAX.

> 1. The versioning mechanism: even when I launch a "Select"-query which
> does not change any data, InterBase generates a unique version of the
> whole table? I understand the mechanism when it comes to DML statements,
> but with selects?

Sort of. It generates a unique version of each record read, not
necessarily the whole table. You may have no intention of modifying
data, but IB doesn't know this. Further, IB supports SNAPSHOT
transactions, where you don't want to see changes COMMITted by other
users.

> 2. To find the maximum of an _indexed_field_, InterBase looks through the
> _table_?

Only if necessary, as above. IB will (usually) use the quickest way of
finding the record you ask for. To see an example of this in action,
try the following:

SELECT
DISTINCT ID
FROM
BIG_TABLE
Note that the IDs returned are ordered, even though you did not specify
ORDER BY ID. This is because it is faster to use the ascending index to
find DISTINCT IDs than to scan the table.

> AFAIK (thinking of the times when I developed dBase apps)
> indexes are stored seperately and are some kind of pointers to the actual
> records. Is InterBase different?

There are a couple more layers of indirection, but in principle it's
similar. The only literal pointer to data that IB has is RDB$DB_KEY,
which is a pointer to a given transaction's copy of the data and is only
valid during the transaction. It's lightning fast, though (faster than
using the primary key, and faster than an index on most other databases)
which makes it quite useful for stored pricedures. Indices are,
themselves, versioned, as well.

> In my opinion, to search the maximum
> within an index, Interbase doesn't have to store _any_ data, just make a
> snapshot (versioning) of the index and rummage this.

Not to search, but you might (and most people usually do) want to
access the data later on in your transaction. Generally speaking, it's
faster to pull things once. Your situation may be an exception.



> Sorry - I can't believe this, because this would also mean (please
> correct me when I'm wrong) that if you would want to retrieve the record
> with the ID 555777, InterBase jumps to ID 1 and browses ascendingly until
> it finds the ID 555777. No - I dare say that the main thing about indexes
> is their qualification for binary search, so Interbase just _has_ to be
> aware of the index's root and its end, right?

We're getting outside of my knowledge of IB internals here, but I don't
believe this is the case. What you describe may seem crazy, but I think
that's exactly what happens and it's much faster, in the general case,
than scanning the entire table for 555777, which may be the last record.

However IB actually handles this internally, I don't think it's
significantly different from other DBs on the market.



> I know, I am quite inconvenient, but this is really a serious problem for
> me and I want to definitely know where exactly the problem is. I'm not
> content with assumptions. But I do appreciate your help.

No inconvenience. This stuff is interesting to me and I appreciate the
opportunity to look further into it.

Philip Brown

unread,
Jun 10, 2000, 3:00:00 AM6/10/00
to
> To the best of my understanding, traversing an index is a lot like
> traversing a linked list. (But see my reply to your other question on
> this subject below.) You can't just go to the middle, or to the end of
> the index. You have to step through it one entry at a time. There is
> slightly more work involved in stepping through every entry in an index
> than there is in stepping through every record in a table.

Virtually all databases use some variant of B-Trees for index storage,
as they minimise the number of disk accesses needed to find any given
key. Imagine a B-Tree (variant) as being like a binary tree with more
than 2 branches against each node (2-8 is usual; 5 branches per node
is usually enough to guarantee access to a range of 2^32 values by
navigating just 6 nodes).

You are right, index *traversal* can be a lot like traversing a linked
list, inasmuch as you step from one to the next.

However, index *seeking* ABSOLUTELY CAN go to the middle or the end
without going through all intermediate entries. It's like a binary
search with more than 2 branches; average number of nodes to search
to find a value is typically much less than 1/2 log2(n).

> > Sorry - I can't believe this, because this would also mean (please
> > correct me when I'm wrong) that if you would want to retrieve the record
> > with the ID 555777, InterBase jumps to ID 1 and browses ascendingly
until
> > it finds the ID 555777. No - I dare say that the main thing about
indexes
> > is their qualification for binary search, so Interbase just _has_ to be
> > aware of the index's root and its end, right?
>
> We're getting outside of my knowledge of IB internals here, but I don't
> believe this is the case. What you describe may seem crazy, but I think
> that's exactly what happens and it's much faster, in the general case,
> than scanning the entire table for 555777, which may be the last record.

Well, it might be the case for Interbase, but I'm with the other guy and
think that it's pretty crazy. Searching a list sequentially is NOT much
faster for the average (general) case in any situation. Average case
performance is going to be 1/2(n), which is pretty rubbish, especially
when you consider that SELECT MAX() is a very common query (and involves
a search of (n) records in the worst case).

> However IB actually handles this internally, I don't think it's
> significantly different from other DBs on the market.

If it acts as you describe (and the behaviour appears to indicate that it
does), then it is entirely different from almost every other DB on the
market. I'd go as far as saying that it is a pretty rubbish implementation
and smacks of having a general-case query execution unit that only has a
few tricks up it's sleeve. After all, every single table SQL query can be
resolved by doing a full table scan - the measure of a good database is
how well it uses indexes to minimise disk reads. IF what you say is true
then this raises reservations (to me anyway) about the range of features
in the Interbase toolbox.

> No inconvenience. This stuff is interesting to me and I appreciate the
> opportunity to look further into it.

I'll follow this with interest.

> -Craig
>
> --
> Craig Stuntz Vertex Systems Corporation
> Senior Developer http://www.vertexsoftware.com

Philip Brown
Informatica Systems Ltd

Craig Stuntz

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to

Philip Brown wrote:
>
> Virtually all databases use some variant of B-Trees for index storage,
> as they minimise the number of disk accesses needed to find any given
> key. Imagine a B-Tree (variant) as being like a binary tree with more
> than 2 branches against each node (2-8 is usual; 5 branches per node
> is usually enough to guarantee access to a range of 2^32 values by
> navigating just 6 nodes).
>
> You are right, index *traversal* can be a lot like traversing a linked
> list, inasmuch as you step from one to the next.

Glad to have some expertise on indices here. I have some questions for
you...


>
> However, index *seeking* ABSOLUTELY CAN go to the middle or the end
> without going through all intermediate entries. It's like a binary
> search with more than 2 branches; average number of nodes to search
> to find a value is typically much less than 1/2 log2(n).

What's the implementation difference between an ascending and
descending index, then? Is it only WRT traversals?



> > However IB actually handles this internally, I don't think it's
> > significantly different from other DBs on the market.
>

> If it acts as you describe (and the behaviour appears to indicate that it
> does), then it is entirely different from almost every other DB on the
> market.

Again, please keep in mind that I know IB best from a user's point of
view -- I haven't seen the code! What I know is this: If I specify an
ascending index explicitly as the optimization plan instead of letting
the optimizer pick one, my plan will be rejected. If I don't specify
one and let the optimizer pick, it will choose a descending index. If
none is available, it uses the "natural" order.

Craig Stuntz

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to

Call Center wrote:
>
> Please compare with this
>
> Select Min(NOT_INDEXED_FIELD) From BIG_TABLE
>
> 1:00 (!)
>
> So my suggested strategy would be to avoid indexes because then the
> queries run faster? Pfuh... this is driving me nuts!

Can you post the PLANs for these queries and the field definitions?
I'm surprised if the index makes a difference on a SELECT query which
doesn't use it.

Robert Kindred

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to
It seems to me, that if I were a database (though I'm not) I would replace
(optimize) the MAX() function with a largest-possible-value-of-this-column.
I would then look for this, and on an indexed column, I would get full
B-tree search speeds. Then when I failed to find it, the last record I
looked at would be the answer.

The only thing necessary is to define a largest-possible and
smallest-possible (for the MIN() ) value for each type of field the database
can hold.

my .02,

Robert Kindred

Bill Todd (TeamB) wrote in message <8hukak$qs...@bornews.borland.com>...
>You _do_ want to avoid indices in this case because it takes longer to
visit
>every entry in an index than it doe to scan the table. As you have found,
>the best performance in this case is to count a field with no index.
>
>--
>Bill Todd (TeamB)
>(Questions received via email cannot be answered.)
>
>

Craig Stuntz

unread,
Jun 12, 2000, 3:00:00 AM6/12/00
to

Philip Brown wrote:
>
> It may well be that IB doesn't actually use B-Tree variants to
> store indices (which itself is unusual as I think they've been proved
> to be an optimal general-case solution in terms of disk reads per
> average record access),

Well, yes, that's the rub. Whatever is going on seems to give
performance more or less equivilant to, say, SQL Server on an average
brew of SELECTs, UPDATEs, and the like. So I presumed, perhaps
incorrectly, that IB used fairly standard mechanisms.

> but this may have something to do with
> the multi-generational nature of IB.

Perhaps. Indices are versioned, at least to a certain degree. I don't
know the technical details on this.

One scenario not covered yet is that this is just a bug in the
optimizer, and that ascending indices would otherwise work. Seems
unlikely, but it's possible.

> The multi-generational abilities probably make the engine overall
> simpler internally in terms of maintaining isolation between
> uncommitted queries, but it may (does?) have some impact on the
> index handling and optimisations available.

The biggest limitation I know of is that there is no "single user
mode."

Craig Stuntz

unread,
Jun 13, 2000, 3:00:00 AM6/13/00
to

Dan Palley wrote:
>
> As for a single-user mode, depending on what you need to do, there are a few
> ways to work around the limitation:

Right, but the issue I was referring to is this: Most of the "speed
tests" performed on various databases are in single user mode.
Completely unrealistic for any real-world work, but that's a benchmark
for you. InterBase, of course, would come out looking badly compared to
a competing database in single user mode.

Benny Schaich

unread,
Jun 14, 2000, 3:00:00 AM6/14/00
to Call Center
Hi there,

it seems nobody answered some of your "burning questions", so I'll jump in
here:

Call Center schrieb:

> 1. The versioning mechanism: even when I launch a "Select"-query which
> does not change any data, InterBase generates a unique version of the
> whole table? I understand the mechanism when it comes to DML statements,
> but with selects?

Let me tell you, if you learnt the Versioning mechanism on any other DB, it
is not what you may expect. First: IB does not physically copy anything for
this Versioning except the case something gets really changed.

In that case the *delta* of the data gets copied to the same data page and
the record is marked with the transaction id of the changing transaction.
This way every following transaction knows if the actual record is actual for
it or if it has to change back to the older version (which is quite fast
because the delta usually is on the same page, already in memory).
Though, you should not worry about the uniqueness of your query, it has no
influence on performance.

> 2. To find the maximum of an _indexed_field_, InterBase looks through the

> _table_? AFAIK (thinking of the times when I developed dBase apps)


> indexes are stored seperately and are some kind of pointers to the actual

> records. Is InterBase different? In my opinion, to search the maximum


> within an index, Interbase doesn't have to store _any_ data, just make a
> snapshot (versioning) of the index and rummage this.

Index is not equal index. An index usually means some sort of tree. This tree
only works for the purpose it was sorted for, not like a list. In a list
sorting a value from high to low you find max and min values at the beginning
and at the end. Not so in an index.

> 3. Furthermore, I don't see the problem with ASC/DESC exactly, but that's
> maybe because I know not very much about the very interna of InterBase:
> When using a ASC index on the field "ID", InterBase is not able to jump
> to the last value in the "index table" to just find the maximum there? It
> just has to go to the first element and browses this?

> Sorry - I can't believe this, because this would also mean (please
> correct me when I'm wrong) that if you would want to retrieve the record
> with the ID 555777, InterBase jumps to ID 1 and browses ascendingly until
> it finds the ID 555777. No - I dare say that the main thing about indexes
> is their qualification for binary search, so Interbase just _has_ to be
> aware of the index's root and its end, right?

Because it's a tree, there is only a root, no "end" whatsoever (and this is
the same in dBASE too). If you imagine a binary tree (which is just a modell
as the trees used in DB's are more complex but work the same way) you have
on node an two leaves. If you go down a level on that tree nodes and leaves
double. That means after stepping down the tree 32 levels you have searched
through over four billion records (in fact 4 294 967 296 elements which is
2 up 32).
As you can compare every jump from level to level with a computers search of
a known space on disk, you get the idea of the power of this technique
compared to running through four billion records on a list.

Therefore to find your ID 555777 it takes a maximum of 32 steps in our
example to find the record, depending on when this ID went into the database
and not where its order is.

Regards,
Benny

b.schaich.vcf

Ping Kam

unread,
Jun 14, 2000, 3:00:00 AM6/14/00
to
I had done some Interbase development some time ago but not any more so my
memory is out of date. IIRC, Interbase index is using Balance Tree. But
there is potential that a tree gets out of balance. To fix this problem,
you will have to backup and restore your database, which will rebuild all
indexes.

Another thing you may do is to optimize your database. Although Interbase
is not very tunable but there is still some tuning you can do. I believe
the manual has suggest as how to tune your database. Check the statistic
from the Server manager. Cache and page sizes may cause too many disk I/O
and become the problem to performance.

HTH,

Ping


Call Center <alexande...@clientel.at> wrote in message

news:MPG.13accda66...@newsgroups.borland.com...
> Just to fuel your engines a little bit:
>
> Bill Todd (TeamB) wrote
> > How long does it take if you do a
> > SELECT COUNT(FirstFieldOfPrimaryKey) FROM BIG_TABLE?
>
> 1:45 minutes at 300.000 records with
> Select Count(ID) From BIG_TABLE
>
> where the primary key consists only of the field ID.


>
> Please compare with this
>
> Select Min(NOT_INDEXED_FIELD) From BIG_TABLE
>
> 1:00 (!)
>
> So my suggested strategy would be to avoid indexes because then the
> queries run faster? Pfuh... this is driving me nuts!
>

> Ok, I was a bit sarcastic: the second, better result has its roots
> probably in caching, but it is fair to assume that the two times are
> equal - certainly no one could claim that the index-version was faster!
>
> Alex

AlisdairM

unread,
Jun 15, 2000, 3:00:00 AM6/15/00
to
Ping Kam wrote:

> I had done some Interbase development some time ago but not any more so my
> memory is out of date. IIRC, Interbase index is using Balance Tree. But
> there is potential that a tree gets out of balance. To fix this problem,
> you will have to backup and restore your database, which will rebuild all
> indexes.

If I recall correctly, bTrees don't like being handed sorted data, as it
simply becomes a less-efficient list. Everything always goes down the
extreme branch. I guess this is something to think about when importing
data!

I can't know if that is the problem in this case though. I would
certainly like to hear confirmation that it wasn't, and there is a more
serious problem at the root here. Or rather, I would like to hear this
fixes the problem, but I shall worry until we hear back :¬ )

AlisdairM

Call Center

unread,
Jun 15, 2000, 3:00:00 AM6/15/00
to
Ok, folks. Please have look at the following problem with BIG_TABLE:

I need to run an update command against my big database (I guess you all
know it meanwhile ;-) ).

---------------------------
Update BIG_TABLE Set FIELD =
(Select FIELD From TABLE2 Where BIG_TABLE.ID = TABLE2.BIG_TABLE_ID);
---------------------------

That's the command. As you can see, I have prepared a table (TABLE2) that
holds all values for the field FIELD in BIG_TABLE. To get this values
into the BIG_TABLE, I use the above statement.

I was really, really kind to InterBase, so I provided an unique index on
the joining field, TABLE2.NR:

---------------------------
Create Unique Index TABLE2_IX On TABLE2(BIG_TABLE_ID);
---------------------------

And now the horror:

For 2000 records in BIG_TABLE (ie a smaller test-version of BIG_TABLE),
the update took 15 minutes! That's absolute inferior. If I run this on
BIG_TABLE - and I _have_ to run it on it - I'm growing old and grey
before I see any result.

When I look at the query plan, it looks like this:

PLAN(TABLE2 NATURAL)
PLAN(BIG_TABLE NATURAL)

Shock!

What about the index no TABLE2.BIG_TABLE_ID ?
Even worse: when I use the PLAN option like this:

Update BIG_TABLE Set FIELD =
(Select FIELD From TABLE2 Where BIG_TABLE.ID = TABLE2.BIG_TABLE_ID
PLAN (TABLE2 INDEX(TABLE2_IX)));

I get the result: "index table2_ix cannot be used in the specified plan"

*swallowing_hard*


So my questions to all the people out there:

- How can I perform this query in a reasonable time? (the most important
question for me!)

- Why can't InterBase use the index, even when I explicitely tell it to
do so (using the PLAN option)?

- Have I done something wrong?

BTW, on SQL Server I did a lot of queries like the one described here,
they all performed at lightning speed. What is making the difference?
Does the query optmizier lack some intelligence?

Thank you for all answers
Alex

Call Center

unread,
Jun 15, 2000, 3:00:00 AM6/15/00
to
Ok, folks. Please have look at the following problem with BIG_TABLE:

I need to run an update command against my big database (I guess you all
know it meanwhile ;-) ).

---------------------------
Update BIG_TABLE Set FIELD =
(Select FIELD From TABLE2 Where BIG_TABLE.ID = TABLE2.BIG_TABLE_ID);
---------------------------

That's the command. As you can see, I have prepared a table (TABLE2) that

holds all values for the field FIELD in BIG_TABLE. To get this values
into the BIG_TABLE, I use the above statement.

I was really, really kind to InterBase, so I provided an unique index on

the joining field, TABLE2.BIG_TABLE_ID (which holds the value of
the corresponding primary key field ID of BIG_TABLE):

Craig Stuntz

unread,
Jun 15, 2000, 3:00:00 AM6/15/00
to

Call Center wrote:
>
> - How can I perform this query in a reasonable time? (the most important
> question for me!)

Not quite sure about why the index isn't being used (I'll think about
it and see if I can come up with anything, but in the meantime, to
answer your most important question, the following stored procedure will
probably run faster than a subquery anyway:

{This is untested, so you'll have to fix my (probably numerous) syntax
errors... :) }

CREATE PROCEDURE UPDATE_BIG_TABLE
AS
DECLARE VARIABLE ID INTEGER;
DECLARE VARIABLE KEY CHAR(8);
DECLARE VARIABLE FIELD_VALUE VARCHAR(100); /*or whatever*/
BEGIN
FOR SELECT
BT.RDB$DB_KEY, BT.ID
FROM
BIG_TABLE BT
INTO :KEY, :ID
DO BEGIN
SELECT
FIELD
FROM
TABLE2
WHERE
BIG_TABLE_ID = :ID
INTO :FIELD_VALUE

UPDATE
BIG_TABLE BT
SET BT.FIELD = :FIELD_VALUE
WHERE
BT.RDB$DB_KEY = :KEY;
END
END

See how this works for you. Oh, and regarding your BIG_TABLE_ID index,
have you tried making it a foreign key instead of a regular index?

HTH,

Call Center

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to
Thank you for your answer, Craig.

I'll try your stored procedure (after debugging... hee hee) if nothing
else works. I already wanted to write such a procedure for myself.

> See how this works for you. Oh, and regarding your BIG_TABLE_ID index,
> have you tried making it a foreign key instead of a regular index?

No, it's no foreign key. I'm afraid it's not possible to define it as a
foreign key (although I suppose it could help). The point is that there
are 6 BIG_TABLES (BIG_TABLE1 ... BIGTABLE6) which contain 6 subsets of
one "VERY_BIG" (so to speak) pool of data.

So the field TABLE2.BIG_TABLE_ID could reference either a record in
BIG_TABLE1 or (if it is not in BIG_TABLE1) in BIG_TABLE2 and so on.

Alas, to define a foreign key constraint therefore fails. It is a
database which I am not happy with, but that's how it is.

Alex

Call Center

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to
Your stored proc worked perfectly at fantastic speed!
Regarding the syntax errors: you just forgot one semicolon, the rest was
ok. Nevertheless, I had to modify it a little bit, so this is the working
version:

CREATE PROCEDURE UPDATE_BIG_TABLE
AS
DECLARE VARIABLE ID INTEGER;

DECLARE VARIABLE FIELD_VALUE INTEGER;
BEGIN
FOR SELECT
BT.ID
FROM
BIG_TABLE BT
INTO :ID

DO BEGIN
SELECT
FIELD
FROM
TABLE2
WHERE
BIG_TABLE_ID = :ID
INTO :FIELD_VALUE;

UPDATE
BIG_TABLE BT
SET BT.FIELD = :FIELD_VALUE
WHERE

BT.ID = :ID;
END
END

And - voila! It worked and took a big load off my mind.

But the problem remains: I just don't want to solve everything with
stored procedures. It would be really interesting why InterBase doesn't
use the (so neatly) provided index.

To me it seems that InterBase _could_ work quite fast, but the query
optimizer doesn't do too good a job. Disagreement?

Alex

Call Center

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to
Your stored proc worked perfectly at fantastic speed! Thanx a lot for
your effort, Craig!

Regarding the syntax errors: you just forgot one semicolon, the rest was

ok. Nevertheless, I modified it a little bit, so this is the working

Wayne Niddery (TeamB)

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to
Call Center wrote in message ...

>---------------------------
>Update BIG_TABLE Set FIELD =
>(Select FIELD From TABLE2 Where BIG_TABLE.ID = TABLE2.BIG_TABLE_ID);
>
>---------------------------
>Create Unique Index TABLE2_IX On TABLE2(BIG_TABLE_ID);
>---------------------------
>
>PLAN(TABLE2 NATURAL)
>PLAN(BIG_TABLE NATURAL)

Is there an index on BIG_TABLE.ID (and is it Active)? Going natural on
Table2 is actually quite normal, but it certainly should be able to
use an index, if it exists, on BIGTABLE.ID).

An alternative stored proc to Craigs that should be much faster again
is to loop through Table2 joined to big_table:

for
select bt.RDB$DB_KEY, t2.field from table2 t2
join big_table bt on bt.ID = t2.BIG_TABLE_ID
into :key, :field
do begin
update big_table set field = :field
where RDB$DB_KEY = key;
end

--
Wayne Niddery - WinWright Consulting
RADBooks - http://members.home.net/wniddery/
You have a Right to Free Speech, but not the right to make me listen,
nor to use my property as a soapbox.

Call Center

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to
Yes, there is an index an the field BIG_TABLE.ID and it is active.

As far as the solutions with the stored procedures are concerned, they
work very well, but I need a SQL solution...

Alex

Craig Stuntz

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to

Call Center wrote:
>
> Your stored proc worked perfectly at fantastic speed!

Great! Note that this procedure will, in all likelihood, be faster
than your original query even if it *was* able to use the index. Why?
RDB$DB_KEY is an absolute address, so no index traversal or
consideration of record versioning is necessary.

Wayne's suggestion looks like a good improvement on my procedure.



> To me it seems that InterBase _could_ work quite fast, but the query
> optimizer doesn't do too good a job. Disagreement?

No disagreement; I agree with this statement. It's worth noting that
no database has an optimizer so good that you can ever completely ignore
this kind of tweaking, but IB's certainly could be better.

Robert Kindred

unread,
Jun 16, 2000, 3:00:00 AM6/16/00
to

Call Center wrote in message ...
>Thank you for your answer, Craig.
>
<snip>

>No, it's no foreign key. I'm afraid it's not possible to define it as a
>foreign key (although I suppose it could help). The point is that there
>are 6 BIG_TABLES (BIG_TABLE1 ... BIGTABLE6) which contain 6 subsets of
>one "VERY_BIG" (so to speak) pool of data.
>
>So the field TABLE2.BIG_TABLE_ID could reference either a record in
>BIG_TABLE1 or (if it is not in BIG_TABLE1) in BIG_TABLE2 and so on.
<snip>
>
>Alex

Interesting. Not that it helps you at all, but Oracle has a feature called
partitioning. This allows you to have many tables that actually appear to
be one huge table. In your SQL statements you reference only one table
name. It is used for very large data sets, and things which age. Ageing
means that occasionally you archive off the oldest table partition, assuming
the partitions are split by date or time.

Robert Kindred

Claudio Valderrama C.

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
set term ^;
create procedure dobig
as
declare variable dbk char(8);
declare variable field varchar(nnn); // use your max length
begin
for select table2.field, big_table.rdb$db_key
from table2 join big_table
on big_table.id = table2.big_table_id
into :field, :dbk
do

update big_table set field = :field
where rdb$db_key = :dbk;
end ^
set term ;^

See http://members.tripod.com/cvalde/document/mysteriousDbKey.htm
for more information.

C.
--
---------
Claudio Valderrama C.
IT Engineer - Independent consultant - CHILE
Owner of the Interbase WebRing
http://members.tripod.com/cvalde
(No support questions by e-mail, please. Thank you.)

Call Center wrote in message ...

>Ok, folks. Please have look at the following problem with BIG_TABLE:
>
>I need to run an update command against my big database (I guess you all
>know it meanwhile ;-) ).
>

>---------------------------
>Update BIG_TABLE Set FIELD =
>(Select FIELD From TABLE2 Where BIG_TABLE.ID = TABLE2.BIG_TABLE_ID);
>---------------------------
>

>That's the command. As you can see, I have prepared a table (TABLE2) that
>holds all values for the field FIELD in BIG_TABLE. To get this values
>into the BIG_TABLE, I use the above statement.
>
>I was really, really kind to InterBase, so I provided an unique index on

>the joining field, TABLE2.NR:

Claudio Valderrama C.

unread,
Jun 24, 2000, 3:00:00 AM6/24/00
to
I agree with your general assumptions.
For that matter, you can disable indexes, insert and enable indexes.
Computing selectivity will enhance the statistics, but it won't balance an
unbalanced index.
However, indexes that come from a PRIMARY KEY or FOREIGN KEY declaration
cannot be inactivated. However, if you create them by means of CREATE INDEX,
the trick is valid.

C.
--
---------
Claudio Valderrama C.
IT Engineer - Independent consultant - CHILE
Owner of the Interbase WebRing
http://members.tripod.com/cvalde
(No support questions by e-mail, please. Thank you.)

AlisdairM <"alisdair.meredith"@NO_SPAM...@benettonformula.com> wrote in
message <39489e73@dnews>...

corey lawson

unread,
Aug 3, 2000, 3:00:00 AM8/3/00
to
What do you mean by "single user mode"? Oracle doesn't have such a
thing...

On Tue, 13 Jun 2000 10:01:50 -0400, Craig Stuntz
<cstuntz@no_spam.vertexsoftware.com> wrote:

>
>Dan Palley wrote:
>>
>> As for a single-user mode, depending on what you need to do, there are a few
>> ways to work around the limitation:
>
> Right, but the issue I was referring to is this: Most of the "speed
>tests" performed on various databases are in single user mode.
>Completely unrealistic for any real-world work, but that's a benchmark
>for you. InterBase, of course, would come out looking badly compared to
>a competing database in single user mode.
>

0 new messages