Limit on number of tables in DB

626 views
Skip to first unread message

Dmitri Shubin

unread,
Feb 24, 2015, 5:11:58 AM2/24/15
to wiredtig...@googlegroups.com
Hi!

Is there any limit on number of tables I can create in WiredTiger DB?
I can see that each table is stored in separate file, so implicit limit on 'opened' tables is the number of available file descriptors.

I wrote simple test that creates 1K tables and found that there are 77 opened file descriptors at the end, so there is some 'recycling'.
But I didn't find any configuration of it (like after what number of opened files it started to close old ones).

And new table creation looks really slow, due to fdatasync() calls -- is it possible somehow to avoid it, something like 'bulk' creation?

Thanks!

Keith Bostic

unread,
Mar 10, 2015, 8:39:21 AM3/10/15
to wiredtig...@googlegroups.com


On Tuesday, February 24, 2015 at 5:11:58 AM UTC-5, Dmitri Shubin wrote:
 
Is there any limit on number of tables I can create in WiredTiger DB?
I can see that each table is stored in separate file, so implicit limit on 'opened' tables is the number of available file descriptors.

There's no limit inside WiredTiger itself, but of course it's possible to run into system limits like available file descriptors.
  
I wrote simple test that creates 1K tables and found that there are 77 opened file descriptors at the end, so there is some 'recycling'.
But I didn't find any configuration of it (like after what number of opened files it started to close old ones).

There's an underlying "sweep" thread that periodically closes open tables that have been idle for some period of time. Because the sweep thread acts based on idle time, not number of open file descriptors, WiredTiger won't respond to having a lot of files open at any particular time.

And new table creation looks really slow, due to fdatasync() calls -- is it possible somehow to avoid it, something like 'bulk' creation?

Table creation in WiredTiger is slower than we'd like, mostly because of the synchronous operations required for recoverability.

Is that an issue for you? Can you tell me more about what you're trying to do and your application requirements?

--keith

Dmitri Shubin

unread,
Mar 11, 2015, 5:20:00 AM3/11/15
to wiredtig...@googlegroups.com
Hi Keith,

Thank you for reply!
We want to store temporal market data for financial instruments in WiredTiger DB.
For each instrument we want to store around 30 fields most of which are not changed on every update.
So column-store table for each instrument looks like the way to go due to its run-length compression for column values.
But since we may want to capture data for several thousands of instrument we need to create tables for all of them.
Which can take quite a lot of time.

Another problem is that we're bounded by number of available file descriptors.
Is it somehow possible to have several tables in one file?

Thanks!

Keith Bostic

unread,
Mar 11, 2015, 8:13:19 AM3/11/15
to wiredtig...@googlegroups.com


On Wednesday, March 11, 2015 at 5:20:00 AM UTC-4, Dmitri Shubin wrote:
 
But since we may want to capture data for several thousands of instrument we need to create tables for all of them.
Which can take quite a lot of time.

Do you create them all at the same time, or can the tables be created on demand?
 
Another problem is that we're bounded by number of available file descriptors.
Is it somehow possible to have several tables in one file?

No, we didn't support that feature of Berkeley DB.

As soon as you have a good sense of exactly where WiredTiger isn't meeting your needs (whether it's going to be the file descriptors, or file create times, or something else), can you please create an issue at our github site? Then we can formally consider how to address the problem.

 

Yuri Finkelstein

unread,
Mar 12, 2015, 12:46:26 AM3/12/15
to wiredtig...@googlegroups.com
Just curious why not put an instrument id into the key of the row? I.e. row key can be instrument_id:timestamp.
Then you can keep multiple instrument time series in a single table. They will be nicely clustered by instrument so scan for instrument id will be efficient. Also, WT has RK prefix compression so space overhead is minimal.


--
You received this message because you are subscribed to the Google Groups "wiredtiger-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wiredtiger-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dmitri Shubin

unread,
Mar 12, 2015, 4:06:04 AM3/12/15
to wiredtig...@googlegroups.com
Hi Yuri,


On Thursday, March 12, 2015 at 7:46:26 AM UTC+3, Yuri Finkelstein wrote:
Just curious why not put an instrument id into the key of the row? I.e. row key can be instrument_id:timestamp.
Then you can keep multiple instrument time series in a single table. They will be nicely clustered by instrument so scan for instrument id will be efficient. Also, WT has RK prefix compression so space overhead is minimal.

Thanks for suggestion!

I'm currently evaluating different possible layouts.
In your approach there will be inserts in the middle of the table which IIUC are not as efficient (both space and time) as appends to the end.
Reply all
Reply to author
Forward
0 new messages