Oracle considering BDB for the NoSQL market?

100 views
Skip to first unread message

rtweed

unread,
Jan 31, 2011, 7:43:29 AM1/31/11
to Caché, Ensemble, DeepSee

rtweed

unread,
Jan 31, 2011, 7:59:31 AM1/31/11
to Caché, Ensemble, DeepSee

OldMster

unread,
Jan 31, 2011, 11:30:23 AM1/31/11
to intersystems...@googlegroups.com
I found it interesting he never included a link to the GT.M results, nor did he ever quantify how his results compared.  At one point he indicates his results were a bit better, but never gives the numbers to show it.  He spent more time trying to spread some FUD about ACIDity of the transactions, after he admits not knowing or understanding GT.M.  Looking at the results published for GT.M, it appears to me the worst result for GT.M were significantly better than the best result he was able to push BDB to.

I haven't run this specific benchmark on Cache, but in my testing of the two (GT.M and Cache), the speed has been overall consistent with each other.  In some situations, GT.M is faster, in others, Cache is faster.  If I get some time, I'll try running both on my server.  I have both GT.M on Ubuntu and Cache for Windows on the same ESXi server, so the hardware will be equivalent.

Mark

jimmy

unread,
Jan 31, 2011, 3:02:07 PM1/31/11
to Caché, Ensemble, DeepSee
I'm not sure if he has a point about the threads starting prior to the
clock starting or not.

I've never really worked with GTM but Cache has always been an honest
product, and therefore I would be happier with a benchmark on cache
where the start time is set by either the first jobbed process or even
the at the first line of code of the parent process. This way the
honesty of the results cannot be in question.

The huge increase in speed with the BDB benchmark was mainly due to an
increase in cache, therefore I cannot work out why the write deamon
should be in question ??? It would be interesting to perform
benchmarks on different systems including Cache, where there is no
caching at all i.e. write deamon size set to 0 (assuming thats
possible). This would potentially give a very clear indication of each
languages strengths without taking into account any of the systems
abilities to keep data in memory.

I'm going to perform a slightly modified version on the GTM code to
see what sort of results I get. Hopefully I will be able to run it
with my deamon on and set my deamon size to 0 and run it again
(remembering to turn my machine on and off before re-running it).

I will let you know the results.

Jim

OldMster

unread,
Jan 31, 2011, 9:42:31 PM1/31/11
to intersystems...@googlegroups.com
I've done what is probably a very poor job of making the benchmark work on both my GT.M and Cache VM's.

GT.M is running on Ubuntu server 10.0.4 64 bit, running the latest release of GT.M (64bit).  The block size of my database is 32k, and I have 4096 buffers allocated, for a total of about 132megabytes of cache.  6 gigabytes of memory are allocated to this VM.

Cache is running on Windows Server 2003 (32 bit) running Cache 2008 (32 bit), with 255 megabytes of global Cache allocated.  4 gigabytes of memory are allocated to this VM.

Both VM's have 2 cpu's allocated to them.

The server is a home built, using an Intel i7 920 processor, 12 gigabytes of memory, running VM ESXi as the hypervisor.  There are 3 gigabyte 7200 RPM SATA drives in the system.  both GT.M and the Cache VM are using only one of the drives, no RAID, etc. is in use.

I am of the opinion that 64 bit vs 32 bit isn't terribly significant for this benchmark, since the primary use of this for both GTM and Cache is to extend memory available for global buffers, and both are configured so that they easily fit into the 32 bit limits.

The only modifications to Bhaskars published routine for Cache is to change the JOB command arguments to Cache compatible ones, and to change the 'input' from a text file to values in a for loop.  Since this all happens 'outside' the timing, it should not affect the results.  I also had to change TStart () to TStart for Cache.

As for the timing, the way it was implemented (by Bhaskar) is that all the 'threads' are started and brought to the starting block, and when they all indicate they are 'ready', the start time is recorded, and then a lock is released to 'open the gate'.  No benchmark processing is done by any of the threads until after the start time has been recorded and the lock is released.  I don't see this as a problem - the goal is to determine database performance, not the performance of starting the threads. In any case, it is done the same for both Cache and GT.M.  From my observations, less than a second was spent in the 'invisible time' for both systems.  If I was running this on VMS, I might be more concerned, since VMS is notoriously slow at starting new processes.

Here are the results I got, Cache first, since this is the Cache forum :-)

USER>d ^threen1e
1 100,000 4 8 379 106,358,020 2 108,308 908,308 54,154 454,154
1 100,000 8 8 416 106,358,020 2 107,274 907,274 53,637 453,637
1 1,000,000 4 8 503 24,414,590,536 28 956,053 8,956,053 34,145 319,859
1 1,000,000 8 8 524 24,414,590,536 19 929,709 8,929,709 48,932 469,985
 
USER>d ^threen1e
1 100,000 4 8 337 106,358,020 2 110,258 910,258 55,129 455,129
1 100,000 8 8 379 106,358,020 2 109,081 909,081 54,541 454,541
1 1,000,000 4 8 503 24,414,590,536 20 924,975 8,924,975 46,249 446,249
1 1,000,000 8 8 524 24,414,590,536 21 933,164 8,933,164 44,436 425,389
 
USER>d ^threen1e
1 100,000 4 8 299 106,358,020 2 106,731 906,731 53,366 453,366
1 100,000 8 8 379 106,358,020 2 107,393 907,393 53,697 453,697
1 1,000,000 4 8 524 24,414,590,536 20 971,049 8,971,049 48,552 448,552
1 1,000,000 8 8 514 24,414,590,536 20 933,325 8,933,325 46,666 446,666
 
And the GT.M results
me@ZZTOP:~$ mumps -run threen1e <$HOME/threen1.dat
1 100,000 4 8 294 106,358,020 2 101,670 901,670 50,835 450,835
1 100,000 8 8 300 106,358,020 3 101,667 901,667 33,889 300,556
1 1,000,000 4 8 417 24,414,590,536 22 889,654 8,889,654 40,439 404,075
1 1,000,000 8 8 452 24,414,590,536 23 891,271 8,891,271 38,751 386,577
me@ZZTOP:~$ mumps -run threen1e <$HOME/threen1.dat
1 100,000 4 8 301 106,358,020 2 101,481 901,481 50,741 450,741
1 100,000 8 8 349 106,358,020 2 101,697 901,697 50,849 450,849
1 1,000,000 4 8 452 24,414,590,536 22 889,825 8,889,825 40,447 404,083
1 1,000,000 8 8 452 24,414,590,536 23 890,947 8,890,947 38,737 386,563
me@ZZTOP:~$ mumps -run threen1e <$HOME/threen1.dat
1 100,000 4 8 294 106,358,020 3 101,617 901,617 33,872 300,539
1 100,000 8 8 294 106,358,020 2 101,957 901,957 50,979 450,979
1 1,000,000 4 8 417 24,414,590,536 23 890,427 8,890,427 38,714 386,540
1 1,000,000 8 8 481 24,414,590,536 23 891,024 8,891,024 38,740 386,566
me@ZZTOP:~$ mumps -run threen1e <$HOME/threen1.dat
1 100,000 4 8 294 106,358,020 2 102,127 902,127 51,064 451,064
1 100,000 8 8 339 106,358,020 2 101,671 901,671 50,836 450,836
1 1,000,000 4 8 452 24,414,590,536 23 890,282 8,890,282 38,708 386,534
1 1,000,000 8 8 453 24,414,590,536 22 890,143 8,890,143 40,461 404,097

Both are remarkably fast....


Mark


rtweed

unread,
Feb 1, 2011, 2:47:16 AM2/1/11
to Caché, Ensemble, DeepSee
Mark

An explanation of the results for those who don't know the 3n+1
benchmark would be helpful. What's the significance of those numbers
and do we know how they compared to the BDB ones?

Rob

OldMster

unread,
Feb 1, 2011, 12:05:28 PM2/1/11
to intersystems...@googlegroups.com
Rob,
I could use that explanation myself.  I didn't even delve deep into the actual benchmark, I just focused on making the changes necessary to get it to run on both Cache and GT.M without actually touching the core benchmark code itself.

Frankly, I think the most exciting part of the whole thing is that GT.M (any mumps for that matter) is mentioned on an Oracle site!

Mark

OldMster

unread,
Feb 1, 2011, 12:10:41 PM2/1/11
to intersystems...@googlegroups.com
One data point that might be useful in determining the usefulness of the benchmark for real world performance is that the 3 runs generated about 500megabytes of journal files on both Cache and GT.M

Mark

jimmy

unread,
Feb 1, 2011, 1:52:33 PM2/1/11
to Caché, Ensemble, DeepSee
Hi Mark,

Thanks for clearing up my misunderstandings about the benchmark with
regards to the start time.

Could you send me both teh GTM and Cache version of your code and I
will run them both on teh same machine to remove any doubt about
machine performance.

Thanks

Jim

Lars

unread,
Feb 2, 2011, 1:51:45 PM2/2/11
to Caché, Ensemble, DeepSee
The fields (separated by spaces) are:

- starting number
- ending number
- number of threads requested
- number of execution threads (the algorithm starts at least 4 per
CPU, so it may be more than requested)
- largest number of steps for the defined range
- highest number reached
- elapsed time
- number of updates
- number of reads
- update rate per second
- read rate per second

Jirka

VN

unread,
Mar 29, 2011, 9:57:19 PM3/29/11
to Caché, Ensemble, DeepSee
Hi Rob,

I ran the 3n+1 benchmark posted on libdb - note, however, I have not
attempted to run a modified version based on Bhaskar's comments and
Don's subsequent post re. a "fixed" version of the benchmark [http://
libdb.wordpress.com/2011/02/07/i-broke-the-rules-on-the-3n1-benchmark-
again/].

The point is, I was able to compare the "broken" version between BDB
and InterSystems Caché.

Don's results were reported for BDB as follows: 2 core Duo Mac at
2.2.GHz with 4GB memory and 300MB of database cache showed the
following: 72 sec for 1-1000000 numbers range; he ran 3 jobs

My results for Caché are: 2.66 GHz Intel i7 Mac with 8GB memory and
256MB of database cache showed the following: 37 sec for 1-1000000
numbers range running 3 jobs.

Based on the 3 job result (with no tuning), it appears that Caché in
my test case was almost twice as fast as Don's BDB case.

However, I pushed forward with the test and found the following:
Jobs Time (sec)
------ ------
3 37
4 27
5 34 (seems like some sort of anomaly)
6 26
7 27
8 28

I stopped at 8 because it was diminishing returns after that (actually
around 7 things started dropping off).

Now, the results that I saw at 8 were:
- GREFs/sec: 507,000
- Block writes/sec: 1,038
- Journal entries: 73,025

If I get some free time, I'll try to re-run the test with the
corrections that Bhaskar pointed out to Don - if I do so, I'll post
the results here.

Cheers,

Vik

On Jan 31, 7:59 am, rtweed <rob.tw...@gmail.com> wrote:
> and I wonder how Cache would fare against the 3n+1 benchmark:
>
> http://libdb.wordpress.com/2011/01/31/revving-up-a-benchmark-from-626...

VN

unread,
Mar 30, 2011, 8:37:35 PM3/30/11
to Caché, Ensemble, DeepSee
Hi Rob (et al),

I was further intrigued by Bhaskar's post [http://
ksbhaskar.blogspot.com/2011/02/from-44-seconds-to-27-seconds-
simple.html] and downloaded a copy of the GT.M routine that he had
used, modified it slightly to fit Caché Object Script syntax, and ran
it on my laptop.

In summary, a direct comparison between Bhaskar's GT.M test and my
Caché test indicates:
- For 4 worker processes: my test completed roughly 22% faster with
~25% more updates/sec and ~27% more reads/sec.
- For 8 worker processes: my test completed roughly 36% faster with
~34% more updates/sec and ~6% more reads/sec

The configuration I used was the same as before: 2.66 GHz Intel i7 Mac
OS X 10.6.6 with 8GB memory, 500GB internal 5200RPM (SAS) drive, with
an HFS file system. Caché was configured with 256MB of database cache
on a Caché 2011.1 Field Test 1 version instance.

The results that I achieved were actually a little better than my
original tests with Don's code:

In each case, I ran 1 - 1000000, varying the number of threads or
worker processes. The results are tabulated below:
From To Jobs Duration (sec) Updates/sec Reads/sec
1 1000000 4 21 103,305 150,924
1 1000000 5 20 108,528 158,528
1 1000000 6 19 114,185 166,817
1 1000000 7 19 114,200 166,831
1 1000000 8 19 114,209 166,840

Bhaskar had posted:
<quote>
For four worker processes:
from 44 to 27 seconds
from to 72,026 to 109,311 reads/second
from to 49,298 to 74,828 updates/second
For eight worker processes:
from 248 to 30 seconds
from 12,782 to 109,285 reads/second
from 8,750 to 74,802 updates/second
</quote>

Bhaskar's configuration (using GT.M) was:
<quote>
CPU: Intel Core2 Duo T7500 at 2.20GHz
RAM: 4GB 667MHz DDR2
Disk: Seagate ST9200420AS 200GB 7200rpm with SATA interface
OS: 64-bit Ubuntu 10.10 with Linux kernel 2.6.35-25-generic
File system: jfs & ext4
Database: GT.M V5.4-001 64-bit version
</quote>

Note that my system was by no means otherwise idle - this was a quick
test on my laptop which had other apps running. This may or may not
indicate the plateau that I observed around 6 jobs (going from 6 to 8
didn't reduce time at all).

Journaling was enabled, and the system used transactions (TSTART/
TCOMMIT) just like Bhaskar's test.

I did absolutely no tuning, and have not tried the test with larger
global buffers.

These results are pretty good given that they were achieved without
any tweaking or turning of knobs whatsoever.

Cheers,

Vik

rtweed

unread,
Apr 1, 2011, 2:46:26 AM4/1/11
to Caché, Ensemble, DeepSee
Those are very impressive figures!

Rob
Reply all
Reply to author
Forward
0 new messages