Adding a hundred thousand records in one bulk operation

4,730 views
Skip to first unread message

Nicholas Schirmer

unread,
Mar 18, 2015, 4:09:58 PM3/18/15
to dex...@googlegroups.com
Hi,

With our web app, we will allow users to store a copy of one of our MySQL database tables offline for offline use, and our initial thought was to use IndexedDB. I found Dexie and love it already, and I already set my page up based off your Ajax Population example (almost an exact copy of it, so that should help you see what I'm currently working with): https://github.com/dfahlander/Dexie.js/wiki/Dexie.on.populate#ajax-populate-sample

However, the database we let them copy could have 150,000+ records in it... I ran a test with just 20,000 records and it took Dexie almost 6 minutes to add them all, which I understand is more of a limitation of IndexedDB and the JavaScript engine itself since I have to do a forEach on my JSON object of 20k and do a single .add() for each. I am doing this within a single transaction, too.

Is there anything you can recommend to optimize this? If IndexedDB is the worst choice for this sort of data storage, what are you able to recommend instead?

Thank you.

David Fahlander

unread,
Mar 18, 2015, 5:14:24 PM3/18/15
to Nicholas Schirmer, dex...@googlegroups.com

Hi Nicholas,

In your case, I suppose you are working with data amounts of around 200 megabytes or more. If not, I wonder why it could take so long. I base this assumption on my own experience with both my own app and on a unit test that tests the performance of adding 10,000 objects (taking 2-3 seconds on FF on a laptop and 6,6 seconds on opera in an android phone).

If so, I believe that both Ram memory and bandwidth could be bottlenecks in your case. It could then be wise to perform the download in chunks of around 1000 items a time to save ram as well as showing a progress bar to the user while download is taking place to make the so more responsive. Notice that this will require you to do the data download in multiple transactions (one per chunk of 1000 items). Using much ram can otherwise slow down the browser. I also suppose the download time could be quite time consuming.

For the user experience, if feasible, do the sync in background without locking the db (not as the populate from ajax sample where database is locked until complete) and the user can start using the data while it's being downloaded. If that's not feasible (data has to be complete in order to work at all) just do the progress bar.

Am I right in assuming the amount of data or do you believe it could it be something else that makes it slow in your case?

Nicholas Schirmer

unread,
Mar 18, 2015, 5:28:24 PM3/18/15
to dex...@googlegroups.com, nsch...@mccarygroup.com
Hi David,

Thanks for the quick reply. My data actually isn't that big at all. It's 10,000 rows from an SQL table, and each row only has about 6 columns with very short strings in each.

In fact, if I load the JSON file (the one that is pulled in from the AJAX request before looping) myself and then save it to a file, it's only 4.8MB total.

So it definitely sounds like my issue is elsewhere, which gives me hope!

As I mentioned, I'm using almost the exact Ajax Populate example, where I bind to .on('ready'), pull in the JSON object with AJAX, and then initiate a transaction where it loops over the JSON object and adds each record one by one for all 10,000. The transaction then commits. And it works fine, aside from the whole 5-6 minute part...

I'm on OS X, working locally, and while I originally tested with Safari, I now did a quick test in Firefox and Chrome.

Safari: 5 minutes 30 seconds
Firefox: 1 minute 10 seconds
Chrome: 30 seconds

Seems like my issue is primarily with Safari then... I believe Safari's implementation of IndexedDB is still lacking, so perhaps that's why? Although my 70 seconds in Firefox is still off from your 3 seconds, so there must still be some optimization I can do (unless this is a case of OS X Firefox vs Windows Firefox).

Thanks again.

Nicholas Schirmer

unread,
Mar 18, 2015, 5:31:18 PM3/18/15
to dex...@googlegroups.com
Sorry, that last reply should have said I'm doing 20,000 records in my tests -- not 10,000. :)

David Fahlander

unread,
Mar 18, 2015, 7:19:25 PM3/18/15
to Nicholas Schirmer, dex...@googlegroups.com
I'm not surprised that the Safari version is slow since it's so new. Firefox used to be the slowest IDB implementation but with latest version (Firefox 36.0.1), it is among the fastest ones. Maybe you have an older version of FF? My unit test actually takes only 2 seconds on my hardware (a HP laptop) to add the 10,000 records on both Opera and FF (just ran it again), it has 6 "columns" including id column, all indexed(!), but they contains very short strings. Unit test source here. To execute it, click here. Could be version diffs or hardware diffs. Notice that the entire test (adding + various ways of querying) takes about 4 seconds. To see the time for the adding part only, you need to click the test and view its log.

Nicholas Schirmer

unread,
Mar 19, 2015, 1:35:48 AM3/19/15
to dex...@googlegroups.com
.... I figured my issue out, after hours of running different tests, and it honestly feels like such a rookie mistake.

I created a wrapper object where I was storing the db initialization, and storing functions that I would call to populate the db and such, and the issue was that I also stored a reference to the table within an object property. So I had Books.db = new Dexie(); and then I had Books.table = Books.db.tableName; so that I would have a shorthand reference.

My guess is you may already be shaking your head and realizing why this was such a big mistake to do! Removing that reference, and just calling Books.db.tableName directly through the db object, sped it up an insane amount.

Safari went from 6 minutes to 30 seconds. Chrome went to 2 seconds.

Sorry for taking up some of your time today, but I thank you for trying to help me out! :)

David Fahlander

unread,
Mar 20, 2015, 11:34:32 AM3/20/15
to dex...@googlegroups.com
I wouldn't say that was a rookie mistake but rather missing documentation about how the table properties work ;) I am happy that Safari wasn't that slow because it worried me a bit! Still I think your post is beneficial for other users doing the same stuff. This can explain some of the magics with Dexie that is not super obvious when looking at the API.

An explanation for other people reading this post:

Each object store that is defined in db.version(x).stores() method will get its own table property to access directly on db such as db.friends.add(x) etc. This property is dependant on whether your Promise-scope is running within a transaction or not. If you store the result of accessing your table getter somewhere else, that Table instance will also contain the current transaction state. So if you accessed the property outside a transaction scope you will get a Table instance that will create a new transaction for each operation no matter if you access it from within a transaction scope.

To sum it up, always use db[tablename] or db.table(tableName) whenever you want to access your table if you want it to adapt to the current ongoing transaction.

I updated the following documentation:

For understanding transaction scopes, read:

David

Vali Shah

unread,
Apr 30, 2015, 9:03:40 AM4/30/15
to dex...@googlegroups.com
Hi Nicholas and David , 
  Thanks for the clear question and explanation, still i had a question of is there any way of adding bulk records to the localStorage through Dexie ? I mean passing a list of objects to the dexie and it will get stored to local or something ? Please let me know if any further information needed on this request. 

Thanks & Regards,
Vali Shah

Nicholas Schirmer

unread,
Apr 30, 2015, 1:21:36 PM4/30/15
to dex...@googlegroups.com
Hi Vali,

Unless David says otherwise, I'm going to say no. If there was a way to pass a list of objects to Dexie, it would just loop over the list internally and store them one by one, so that's what you have to do manually instead.

Start/open a Dexie transaction, loop over your list of objects, and store each object one at a time. Afterwards, the transaction will submit.

Simple example:

// DB is your Dexie object
// objectList is your list of objects
DB.transaction('rw', DB.table, function()
{
    var numObjects = objectList.length;

    for ( var i = 0; i < numObjects; i++ )
    {
        DB.table.put( objectList[i] );
    }
});

Vali Shah

unread,
Apr 30, 2015, 3:07:05 PM4/30/15
to dex...@googlegroups.com
Hi Nicholas,
  Thanks for the quick response. That's exactly what i have expected. Looks like our Dexie community was low. Do we have any other developers communties.

David Fahlander

unread,
May 19, 2015, 7:25:43 AM5/19/15
to Vali Shah, dex...@googlegroups.com

It's a bit more activity on github. Try searching closed issues at https://github.com/dfahlander/Dexie.js/issues 
Regards,
David

Jeff Gochin

unread,
May 27, 2015, 7:25:39 PM5/27/15
to dex...@googlegroups.com
We have a database that loads from MS-SQL in to IndexedDB. We load over 70 tables of various sizes.  The largest is currently around 20K records.  What we have noticed that when loading the largest table, the import into IndexedDB starts to slow down once we reach 50%, or 10K records. Our approach is that we download the whole batch from the server then import.  If we start to have issue with performance, our plan is chunk the download and import process.

As for the right choice I personally like the noSQL database that is IndexedDB, but you may find that you start doubting the choice when you need SQL features like joins and view. Stay the course though; in the end it is the correct choice because it is on a standards track with the W3C.  The other choice is to use WebSQL, which at this time will never be ratified by the W3C because there is only one implementation of it via SQLite which is what the WebSQL implementation uses.

fred.lap...@gmail.com

unread,
Aug 24, 2015, 7:43:07 PM8/24/15
to Dexie.js
On Wednesday, 27 May 2015 19:25:39 UTC-4, Jeff Gochin wrote:
>We load over 70 tables of various sizes.  The largest is currently around 20K records.  

I am struggling with loading a SQL database having 21 tables (but only some 100kB of data) via AJAX using the tutorial ('Populate using AJAX) example as a base. I have a prototype running with a simple single record table to work out the basics. But I amm darned if I can figure out just how to do the second, third, etc. table/data stores.

I am sure it must be done within the existing db.on('ready',function(){} block but it not at all clear just wher or how. I have constructed a schema object that contains all the store definitions , and I have modified the tutorial example to use db[storename] so I can potentially loop through the schema storenames. But I could use a suggestion where to go next.

Fred LaPlante

Fred LaPlante

unread,
Aug 25, 2015, 10:58:40 AM8/25/15
to Dexie.js
I have tried the following construct:
    var storeNames = ['Parameters', 'Roads'];
    db.on('ready', function () {
       for (var storeName of storeNames) {
         console.log("hdr: working storeName'"+storeName+"'")
         return db[storeName].count(function (count) {
           ......
         });
        };

But it only processes 'Parameters' and never touches 'Roads' at all.
Clearly I am missing something. Any advise will be appreciated.


Fred LaPlante
Reply all
Reply to author
Forward
0 new messages