Database speed comparison

kdtop

unread,

Jul 18, 2021, 9:34:07 PM7/18/21

to

I came across a post on Hackernews (https://news.ycombinator.com/item?id=27872575) about the time needed to insert "1 billion" rows into a relational database (SQLite) -- although it seems 1 billion was too ambitious and the article actually only gives results for 0.1 billion (100 M) writes.

Here is the actual article: https://avi.im/blag/2021/fast-sqlite-inserts/

The data stored is:
key -- integer
s -- 6 character string
age - integer // 5, 10, or 15
active - integer //0 or 1

The author used various languages with widely ranging speed results. It looks like hist best speed was achieved by preparing 50 rows and posting them all at once, achieving 100 million rows in 34 .3 seconds. I didn't scrutinize his code, so may have some details wrong here. His machine was: MacBook Pro, 2019 (2.4 GHz Quad Core i5, 8GB, 256GB SSD, Big Sur 11.1)

I am wondering how fast yottadb could achieve this? Perhaps this is a fool's errand, but it interests me.

I would think this code could be used. Since the author reported using "prepared rows", I am assuming he was not generating the random values for each row each time. That would really be only a CPU test, not a database test.

new startH set startH=$h
new arr
new ct set ct=0
;//Set up array with 100 lines of random data
for quit:(ct>99) do
. set ct=ct+1
. new st set st=""
. new j for j=1:1:6 set st=st_$char($r(25)+65)
. set arr(ct)=st_"^"_($r(3)*5)+5_"^"_$r(2) ;//I think this would be ~11 bytes
;//arr should be ~11 bytes x 100 =1,100 bytes
;
;//merge 1 million instances of 100 lines each
for quit:(ct>1000000) do
. set ct=ct+1
. merge ^TMP(ct)=arr
write "Time= ",startH," --> ",$h,!

Anyone interested in trying this on a test system? I don't have one I am willing to run this in right now.

I think the total size would be 100 million x 11 bytes = 1,100,000,000 or ~1.1 gb unless I have my math wrong.

My gut feeling is that the limiting factor is going to the be the speed the operating system is able to put data out to the filesystem. The difference between SQLite, with all the optimizations the author could find and yottadb would come down to CPU cycles. And I suspect that is not the bottleneck.

Any thoughts?

Kevin T

rtweed

unread,

Jul 19, 2021, 6:45:18 AM7/19/21

to

Please see this: https://github.com/robtweed/global_storage/blob/master/Performance.md#basic-global-set-write-performance-test

Read the full blog for context: https://github.com/robtweed/global_storage

Note that performance using raw M code will be even faster: 1 million key/value pair sets per second should be obtainable from YottaDB on even relatively modest hardware.

A key conclusion of the blog article is that such performance significantly exceeds even the more well-known embedded databases that are designed and considered to be ultra-fast

Whether anyone in the mainstream of IT knows or cares about this is more difficult to assess. I've seen passing interest so far, but that's about as far as it goes. I guess most people don't worry about the performance of the databases they use, at least not to the extent that they'll look beyond the usual culprits?

Still, we can only try to wake them up.

Rob

kdtop

unread,

Jul 19, 2021, 1:03:08 PM7/19/21

to

This is a valuable write-up. Thanks for making it!

Kevin

K.S. Bhaskar

unread,

Jul 22, 2021, 5:33:33 PM7/22/21

to

Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m

Regards
– Bhaskar

rtweed

unread,

Jul 23, 2021, 10:40:14 AM7/23/21

to

That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

Rob

K.S. Bhaskar

unread,

Jul 23, 2021, 11:28:27 AM7/23/21

to

On Friday, July 23, 2021 at 10:40:14 AM UTC-4, rtweed wrote:
> That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.
>
> Rob

Thanks Rob. I too was pleasantly surprised by the numbers, especially on the Raspberry Pi Zero. I would like to publish benchmarks somewhere, but this is not a realistic benchmark, and one that does not lend itself to apples-to-apples comparisons. I'd like to find a nice key-value NoSQL benchmark, and make that run on YottaDB.

In any case, suggestions welcome. I did Tweet them.

Regards
– Bhaskar

Akabouncue

unread,

Jul 23, 2021, 10:19:37 PM7/23/21

to

kdtop

unread,

Jul 27, 2021, 10:17:49 PM7/27/21

to

Bhaskar,

These are very impressive numbers. I'd love to see them highlighted on Hackernews. I would post them, but I think to get traction, there would need to be a write-up. Here is another post I found of someone else jumping on the speed-test bandwagon.
https://blog.metaobject.com/2021/07/inserting-130m-sqlite-rows-per.html

I don't have the skill or forum to write up such an evaluation of yottadb. Any takers?

And again, Bhaksar, thanks for working on this. I love that a Raspberry pi holds it's own in terms of speed!

Kevin

K.S. Bhaskar

unread,

Jul 27, 2021, 10:50:38 PM7/27/21

to

Kevin –

I'm working on a YottaDB blog post about it. And I'm hoping to actually set a billion nodes (i.e., insert a billion rows) in under one minute, albeit on an x86_64 PC. I suppose I could set a billion nodes on a Raspberry Pi Zero if I had the patience!

Regards
– Bhaskar

pahihu

unread,

Jul 28, 2021, 6:59:39 AM7/28/21

to

K.S. Bhaskar ezt írta (2021. július 22., csütörtök, 23:33:33 UTC+2):
> Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m
>

Hi,

The linked M routine does not run processes in parallel:

for i=1:1:nproc do setdata(i)

The time reported is for the last setdata() call only, so increasing
the number of processes decreases the reported elapsed time.

The corrected code:

set start=$zut,end=0
for i=1:1:nproc do
. set:^ctrl(i,"start")<start start=^ctrl(i,"start")
. set:end<^ctrl(i,"end") end=^ctrl(i,"end")

Regards,
pahihu

K.S. Bhaskar

unread,

Jul 28, 2021, 2:47:39 PM7/28/21

to

Pahihu –

Thanks for the correction. You are right. Although the “starter's pistol” of the M lock release means that all child processes start at essentially the same time, and that in typical cases, the difference is likely to be in the millisecond range.

I will fix it in the next iteration. Thanks again.

Regards
– Bhaskar

K.S. Bhaskar

unread,

Jul 28, 2021, 10:05:46 PM7/28/21

to

OK, I have a considerable amount of egg on my face. Not only was the program wrong, but the time I reported for a Raspberry Pi Zero W was actually on a Raspberry Pi 3. I have corrected the program and uploaded it. Here are the current numbers, which are still respectable, but not knock-your-socks-off numbers.

On a Raspberry Pi Zero W (32-bit Debian Bullseye):

$ yottadb -run fastinsert 1E6
Set 1,000,000 nodes in 75.457866 seconds using 1 processes at 13,252 nodes/second
$

On a Raspberry Pi 3 (64-bit Debian Bullseye):

$ yottadb -run fastinsert 1E7
Set 10,000,000 nodes in 46.736418 seconds using 4 processes at 213,966 nodes/second
$

On the home-brew (not overclocked) AMD Ryzen-7 3700X (64-bit Ubuntu 21.04):

$ yottadb -run fastinsert 1E8
Set 100,000,000 nodes in 51.999496 seconds using 16 processes at 1,923,096 nodes/second
$

I have no excuses to offer. Thank you for keeping me honest Pahihu. Now back to wiping the egg off my face!

Regards
– Bhaskar