I came across a post on Hackernews (
https://news.ycombinator.com/item?id=27872575) about the time needed to insert "1 billion" rows into a relational database (SQLite) -- although it seems 1 billion was too ambitious and the article actually only gives results for 0.1 billion (100 M) writes.
Here is the actual article:
https://avi.im/blag/2021/fast-sqlite-inserts/
The data stored is:
key -- integer
s -- 6 character string
age - integer // 5, 10, or 15
active - integer //0 or 1
The author used various languages with widely ranging speed results. It looks like hist best speed was achieved by preparing 50 rows and posting them all at once, achieving 100 million rows in 34 .3 seconds. I didn't scrutinize his code, so may have some details wrong here. His machine was: MacBook Pro, 2019 (2.4 GHz Quad Core i5, 8GB, 256GB SSD, Big Sur 11.1)
I am wondering how fast yottadb could achieve this? Perhaps this is a fool's errand, but it interests me.
I would think this code could be used. Since the author reported using "prepared rows", I am assuming he was not generating the random values for each row each time. That would really be only a CPU test, not a database test.
new startH set startH=$h
new arr
new ct set ct=0
;//Set up array with 100 lines of random data
for quit:(ct>99) do
. set ct=ct+1
. new st set st=""
. new j for j=1:1:6 set st=st_$char($r(25)+65)
. set arr(ct)=st_"^"_($r(3)*5)+5_"^"_$r(2) ;//I think this would be ~11 bytes
;//arr should be ~11 bytes x 100 =1,100 bytes
;
;//merge 1 million instances of 100 lines each
for quit:(ct>1000000) do
. set ct=ct+1
. merge ^TMP(ct)=arr
write "Time= ",startH," --> ",$h,!
Anyone interested in trying this on a test system? I don't have one I am willing to run this in right now.
I think the total size would be 100 million x 11 bytes = 1,100,000,000 or ~1.1 gb unless I have my math wrong.
My gut feeling is that the limiting factor is going to the be the speed the operating system is able to put data out to the filesystem. The difference between SQLite, with all the optimizations the author could find and yottadb would come down to CPU cycles. And I suspect that is not the bottleneck.
Any thoughts?
Kevin T