The values of SORTMEM and SORTMRG are very dependent on the relative speeds
of the processor and disk and also on available memory. In Eugene's case, he
is using Managed Flash Technology disks which are astonishingly fast.
I have just tried a range of values here when sorting a file with just under
2 million records. It looks like a SORTMEM value of 4096 gives me best
performance but this would probably vary depending on the data I am sorting
and what else is happening in the system at the time.
The sort system builds a simple binary tree. When this reaches the memory
usage set by SORTMEM, the tree is written to disk and we start again.
SORTMRG determines how many disk trees we merge in each pass. A value of 4
seems to work well.
Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200
GaryW
GaryW
To all:
OK, this is definitely approaching "bug" status. I can "fix" the
sort problem by adding something arbitrary to the sort key. For
example,
"SELECT DOCREC BY TERMINAL" takes 74 seconds, BUT
"SELECT DOCREC BY TERMINAL BY DOCUMENT.NO" sorts in THREE!
Of course, there is no reason that I can't ALWAYS do the second as the
results are perfectly acceptable, but I SHOULDN'T HAVE TO.
GaryW
On Aug 9, 9:31 am, Gary <gwalb...@gmail.com> wrote:
> Ashley,
>
> A coupld of comments on my previous post (DAMN GOOGLE, why can't I
> edit these?)... I meant to say "TERMINAL" not "TERM" and I should
> have pointed out that the formatting is probably NOT the problem as
> the keys that sort quickly are formatted just like the keys that sort
> poorly. It REALLY seems that the determining factor is the number of
> identical keys.
>
> GaryW
> I understand the process of dumping the BTree to file, sounds fine.
> I'm not sure I follow why your keeping the number of cached files
> down by re merging then when you get above X number. What was
> your thinking on this?
Ultimately, the select list has to end up as a sorted list of ids. A big
select may produce many intermediate files, each up to SORTMEM in size. It
would be possible for READNEXT to look at all of these to work out which is
next every time that it fetches an id but the performance is far better if
the merge is done a few files at a time. This technique has been a common
part of sorting for as long as I've been in the industry (too long to
mention!).
I have not been following the other issues in this thread but I have noticed
a few odd points along the way....
There is already a very basic diagnostic in the query processor to show how
it will handle a query. This is activated using the DEBUGGING keyword but
also requires use of OPTION $QUERY.DEBUG first. This diagnostic is intended
for trapping errors in the query parser but does help to show how a query
will be processed.
A sorted select where the ids are already in order will result in a very
unbalanced sort tree. This is one good reason not to take SORTMEM up too
high.
A query with multiple selects will process them essentially left to right
though there is some optimisation. A query that finds an active select list
when it starts uses that as the basis for its processing and will then
examine each record in that list to see if it meets any new selection
criteria.
There are several known bugs in the query processor in the GPL source, all
fixed in the commercial product. Such are the penalties of open source.
Just out of curiosity, how do you justify calling your product OPENQM now?
.. since it is closed source. Are you planning to go back to calling it just
QM?
If I were to decide to use OPENQM, during an IT audit people would be asking
the same thing and at the moment I cant think of a decent answer.
B Regards, Steve
Let's not start this argument again.
There is an open source version. You and I agreed that it would be treated
as a sandbox for developers who want to play with the possibility of
contributing things back to the mainstream source. We also agreed that it
would not be updated unless there was a massive shift in how it is being
treated by the open source community.
> What should I say to an IT department/auditor who questions why
> the product I use is called OPENQM and yet the source code FOR
> THE PRODUCT BEING USED is not available?
Why should they care what the product is called? What matters is that they
get a robust commercial grade product with outstanding support. If they
really want to know why it has "open" in the name, explain about the sandbox
version.
I have yet to come across a user who has raised the concern that you cite.
What should I say to an IT department/auditor who questions why the product
I use is called OPENQM and yet the source code FOR THE PRODUCT BEING USED is
not available?
Hi Ashley,> However, I have been asked by a banking client if I can provide> indemnity from the database provider OR full source code.We do have clients for whom we enter into an escrow agreement that would give them full access to the source if we ceased trading for any reason.
Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200
OK thanks Martin I get that.
However it isn't just what the product is called. Your website slams home
the benefits of OPENQM having an open source version for Linux but there is
no mention that the open source version is well out of date and, even worse,
is currently frozen.
Our best clients are regional members of global advertising agency networks
run out of New York City, London and Paris. These guys are smarter than we
are. They are going to take one look at your web site, ask a few questions
and then say we are dopes for trying to pull the wool over their eyes.
This is our position, I do understand your other clients may not be in the
same position.