Table Limits of one Database

42 views
Skip to first unread message

Nitai @ Razuna

unread,
Dec 29, 2009, 3:10:50 PM12/29/09
to h2-da...@googlegroups.com
Hi Thomas,

We really like H2 and it comes as the default database for our own
open source project. Now, due to the nature of our web application we
separate each created host with its own tables with a prefix in one
database.

We have seen that MySQL can not scope with 50000 tables in one
database and crashed on us. Oracle so far is the only database that
holds up well. Thus, I'm wondering if there is any limitation with H2
when it comes to many (over 50000) tables in a database.

Thank you and keep up the good work.

Kind Regards,
Nitai

Thomas Kellerer

unread,
Dec 29, 2009, 7:19:04 PM12/29/09
to H2 Database
On Dec 29, 9:10 pm, "Nitai @ Razuna" <ni...@razuna.com> wrote:

> Oracle so far is the only database that holds up well.

Really?

What about PostgreSQL or Firebird (OpenSource and really free - as
opposed to MySQL)
Or DB2 Express, SQL Server (when it comes to commercial products)

Nitai @ Razuna

unread,
Dec 29, 2009, 7:51:33 PM12/29/09
to h2-da...@googlegroups.com
Well, I have to admit that when MySQL crashed we just moved to
Oracle.... So we really didn't look any further.

But the question remains, is H2 able to scope with those tables in one
database? :-)

On Wed, Dec 30, 2009 at 1:19 AM, Thomas Kellerer
<google...@sql-workbench.net> wrote:
> Really?
>
> What about PostgreSQL or Firebird (OpenSource and really free - as
> opposed to MySQL)
> Or DB2 Express, SQL Server (when it comes to commercial products)

--
See for yourself how easy it is to manage files today. Join the revolution!

Razuna SaaS On-Demand - Hosted Digital Asset Management Solution
http://www.razuna.com/

Razuna - Open Source Digital Asset Management
http://www.razuna.org/

Follow us on Twitter
http://twitter.com/razunahq

abc6587

unread,
Dec 29, 2009, 7:59:49 PM12/29/09
to h2-da...@googlegroups.com
Just curious as to the reasons for such design, do all these tables from different hosts have different columns?


--

You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.



Nitai @ Razuna

unread,
Dec 29, 2009, 8:21:56 PM12/29/09
to h2-da...@googlegroups.com
It's a multiple host system and users don't need to worry about
databases at all. Plus we did not want to share tables among clients
and separate their records with a host key.

For such systems there are either the option to use one database with
prefixes, to have for each host its own database or to separate
records with a customer key. At the time we build the system we opted
for the one database with prefixes.

The last time I checked if you wanted to do the same with Wordpress
they are also using the same technique with prefixes to use it in a
multi host environment.

Kind Regards,
Nitai

On Wed, Dec 30, 2009 at 1:59 AM, abc6587 <abc...@gmail.com> wrote:
> Just curious as to the reasons for such design, do all these tables from
> different hosts have different columns?

--

Alexander Hartner

unread,
Dec 29, 2009, 8:33:53 PM12/29/09
to h2-da...@googlegroups.com
Oracle, SQL Server and DB2 I believe support creating tables in different schema, which is similar to using a prefix, but subtly different.

Personally I really don't like using a prefix at all at it addresses neither true separation, nor consolidates the data. Yes, the data is separate, but nothing is preventing the user from accessing the different tables at the SQL layer. If the tables were in different schema / databases they could be protected at SQL user level giving complete separation. Using different tables makes any form of comparison / analysis, not to speak of upgrading the schema very difficult. Just think of the complexities involved with adding a new index or column. To me this seam like a very bad idea and goes against the core idea of databases, there one database has a limited set of tables, which contains many rows of records. In your case you have many thousands of tables with fewer rows then you would end up with if you had selected a single table including a host key.

For this reason I doubt would will get many people who have stretched their databases to such extend.

Nitai @ Razuna

unread,
Dec 29, 2009, 8:44:42 PM12/29/09
to h2-da...@googlegroups.com
Yes, we already use schemas and I'm very familiar with database
design. When we started it was a couple of years ago and as you might
know changing a database system is not a one day job :-)

Today I would go a different route. But honestly, I see some
enterprise systems using the same "prefix" philosophy. I guess it all
grows over the use of time....

Ryan How

unread,
Dec 29, 2009, 10:36:51 PM12/29/09
to h2-da...@googlegroups.com
>
>> For this reason I doubt would will get many people who have stretched their databases to such extend.
>>
If not well tested, You could always put together a quick test case. From what I've seen usually the limits in H2 are on memory usage.


Ryan

abc6587

unread,
Dec 30, 2009, 12:12:21 AM12/30/09
to h2-da...@googlegroups.com
Even if you find a database to support this, I doubt it will be optimized for this particular scenario; even if it is optimized, will it scale? I understand you've already run into scalability problems once. Is there a guarantee that it won't happen again in the future? If the code is only two years old, it's not legacy yet. Maybe it's just me, but I'd consider investing into refactoring away an antipattern that is only 2 years old.

Sam Van Oort

unread,
Dec 30, 2009, 2:10:57 PM12/30/09
to H2 Database
Hi Nitai,

The answer is "so many your computer will fail before H2 does." Go
crazy with tables if you want to! Millions of tables should work
without problems.

For an exact answer, I've consulted the H2 databse source
(org.h2.engine.Database and org.h2.schema.Schema) to answer your
question. The only limit I can find for the number of tables and
schema is the size of a Java array (over 2 BILLION objects). This
limit may also apply to the total number of database objects -- I'd
have to poke into the innards of database object storage to find out,
and it probably depends on the filesystem used.

Why is this limit unimportant? Well, there are a lot of sub-objects
created for things like schemas, tables, etc and Java objects use a
lot more memory than C structs. By the time you hit the hard limit
on database objects, you'll *probably* need terabytes of memory just
to hold the database schema/table/metadata.

Extra bonus points for H2: internally, H2 uses HashMaps to store
schemas, tables, and columns, so lookup by schema/table/column name is
constant-time regardless of the number of objects. Of course, opening
and closing the database will take longer with more objects. You have
to give Thomas Mueller points for removing any and all arbitrary DBMS
constraints (row size, string size, number of tables/users, etc).

That said, the other people are right about that many tables being a
bad design move. You'll waste a LOT of memory holding schema/table
info, and it is much more efficient to use a single large table.
It'll be much harder to maintain your system with so many tables --
you give up any hope of using the console tool effectively, and what
happens if you end up with conflicts in table name or schema names?
Normalization and proper use of indexes/joins are the appropriate
design choices.

One final catch: H2 is not tested with obscenely huge giga-scale
databases, because it is too difficult, slow, and not a development
priority. There's no reason for it to fail, but no guarantee of
stability either. If you do find bugs, please report them though!

Regards,
Sam Van Oort
(junior Committer for H2 project)

Nitai @ Razuna

unread,
Dec 30, 2009, 3:30:49 PM12/30/09
to h2-da...@googlegroups.com
Hi Sam,

Thanks a lot for your detailed answer. This is sufficient for our use
and I have been using H2 over a year now and see how far it got. Great
work. On the database design I have to differ a bit. Initially I might
have not explained it very well, so I try again and please feel free
(anybody) to respond.

See the system is build up to be used for many hosts/domains, but all
with the same code base. The only difference is really only the data.
Thus we build a system that can host many domains within one
administration.

So, what we have is one database. Within that we have some default
tables and for each domain separate tables differentiated by a prefix,
like host1_images, host2_images, etc. As I mentioned in an earlier
email, one could also do "one" table and differentiate the records
with a customer-id or domain-id.

The other option is to create for each host one database and thus have
a database-1, database-2, etc. each with its own tables. Guess this
approach would give the best security and separation between the
hosts. The only problem I see with this approach is that a general
administrator and its user data, who needs access to all hosts data,
would have to be created in each user table. Also updating tables
would be quite cumbersome since one would have to alter each database
with its tables.

So, as I see it, there are only 3 options in this scenario (one system
/ many hosts):

1. One database with prefix tables for each host.
2. One database with tables and records are separated with a host id.
3. Create a database for each host.

Keen to hear your opinions ;-)

Kind Regards,
Nitai

Sam Van Oort

unread,
Dec 30, 2009, 8:38:44 PM12/30/09
to H2 Database
Hi (again) Nitai,

The standard approach is to use keys for hosts and/or domains in your
tables. IE. table "hosts" and "domains" (with host key for each),
images (with domain key), etc.

This reduces the number of table tables, and means you can re-use the
same queries (since the table names are constant).

It is also easier to change the table structures if you want to add
functionality.

Finally, there is some storage overhead for each table and index, and
if you have a lot of small tables, a few bigger ones will run quicker
for insert/select/update. This translates to faster service.

Cheers,
Sam

> > For more options, visit this group athttp://groups.google.com/group/h2-database?hl=en.


>
> --
> See for yourself how easy it is to manage files today. Join the revolution!
>

> Razuna SaaS On-Demand - Hosted Digital Asset Management Solutionhttp://www.razuna.com/

Tsvetozar

unread,
Jan 2, 2010, 11:40:41 AM1/2/10
to h2-da...@googlegroups.com
> As I mentioned in an earlier email, one could also do "one" table and differentiate the records
> with a customer-id or domain-id.

From my experience I really would suggest to put a customer-id (domain-id, system-id, whatever key-id) to identify who's the record is and store all in a few tables instead of the other possibilities that you describe.

I've developed in my freetime similar web based SaaS solution (using MySQL) and used separate database for each subdomain (each user registered its own subdomain and got a separate database with empty tables in it).

This I though was perfect from a security point of view, but as I understood lately, is very hard to maintain database structure as new features are introduced in, checks for problems or inconsistencies, and even it's a problem if you have some background processes that would have to regularly run db queries, send notifications, do updates, etc.

Instead of maintaining only one DB structure and a few SQL queries for your updates or checks, you'll have to multiply them by the number of users (no matter prefixed tables or databases) you have.

Nitai @ Razuna

unread,
Jan 3, 2010, 3:21:06 PM1/3/10
to h2-da...@googlegroups.com
On Sat, Jan 2, 2010 at 5:40 PM, Tsvetozar <tsve...@email.bg> wrote:
> This I though was perfect from a security point of view, but as I understood lately, is very hard to maintain database structure as new features are introduced in, checks for problems or inconsistencies, and even it's a problem if you have some background processes that would have to regularly run db queries, send notifications, do updates, etc.

Right, but as you just mention, from a security standpoint it is
"perfect". For some customers keeping millions of records in a
database mixed with all all customers just differentiated by a hostkey
is simply not justifiable.

In my experience, maintaining many tables is not a huge headache if
you have some scripts in place. I have done this for many projects in
the past and it all works great. Sure you have to concentrate, but you
should do that anyway for any customer :-)

Kind Regards,
Nitai

Thomas Mueller

unread,
Jan 6, 2010, 1:08:37 PM1/6/10
to h2-da...@googlegroups.com
Hi,

> Right, but as you just mention, from a security standpoint it is
> "perfect". For some customers keeping millions of records in a
> database mixed with all all customers just differentiated by a hostkey
> is simply not justifiable.

There are multiple ways to secure data:

A) One database per customer, relying on file system or database
security (you could use encrypted databases). And you probably need
more RAM, because each database would need it's own process (not if
you also rely on JVM 'sandbox' security).

B) Using a separate 'namespace' for each customer (schema or table
name prefix). You could use one database user per customer, and
grant/revoke the rights. This will result in a lot of tables, which is
potentially slow (specially opening and closing the database in the
case of H2; you would need to run your own tests to find out exactly
how much of a problem this is).

C) Rely on 'application level security', by using a special
'customerId' column per table. In that case you need to ensure the
application is written in a secure way. You can't rely on grant/revoke
here, but you could use views of the form CREATE VIEW INVOICES AS
SELECT * FROM ALL_INVOICE WHERE CUST_ID = @CUST_ID. This is a bit
tricky. Once I saw this pattern was used in an Oracle database, I have
never seen this pattern used for H2 so far. One problem is that H2
doesn't support updatable views. The Oracle solution used stored
procedures, but in general it's a bit complicated.

If security is very important, A) would be best. My advice is to do
something between A) and B): group customers in multiple databases. In
any case, if security is important, I would protect against SQL
injection, for example by disabling literals:
http://www.h2database.com/html/advanced.html#sql_injection

Regards,
Thomas

McKinley

unread,
Jan 6, 2010, 1:54:08 PM1/6/10
to H2 Database
In reality this is a one day job.

1. Make a copy of the per customer schema and dedicate it for use by
all customers.
2. Add key columns of customer ID to each table in the unified schema
if they do not already exist.
3. Create a SQL statement template in your favorite template language
(Velocity for example). The SQL statement will generate a SELECT INTO
command from each customer specific table into each table in the
unified schema. You use a template language so that you can neatly
swap out the table name prefix on the old tables with the customer in
question. The template will let you neatly apply the new customer ID
as a constant in the insert statements.
4. Create a different template that will make a bunch of statements
like CREATE VIEW {$customer.getPrefix()}_NAME_OF_OLD_TABLE_RETIRED
AS... SELECT * FROM NAME_OF_NEW_TABLE WHERE CUSTOMER_ID = $
{customer.getID()}. Of course, just before you create the views you
need to rename the old tables for the customer to PREFIX_X_RETIRED or
some such. Put that in the template too.
5. Test the template and once working you can move everyone to a
unified schema in about the time it takes you to backup you database.
6. You now have a database with one set of tables and 5000 views. You
only have to take each user offline for seconds likely.
7. Once you feel confident that your conversion was successful, drop
the RETIRED tables. You have all the time in the world to update your
application to use the tables directly. They will happily use the
views that match the old schema forever. When you update the
application you can drop all 5000 sets of views and you are free.

Kind regards,

McKinley

Nitai @ Razuna

unread,
Jan 6, 2010, 3:16:00 PM1/6/10
to h2-da...@googlegroups.com
Hi Thomas,

Thanks a lot for your time and suggestions. This discussion came upon
since some people did not like the already in place "solution" (we use
your "B" one). So far, only MySQL was not able to scope with this one.
Oracle and H2 seam to hold up quite well. But eventually we might have
to move to "C" if we see that 180000 tables are a problem...

Kind Regards,
Nitai

On Wed, Jan 6, 2010 at 7:08 PM, Thomas Mueller
<thomas.to...@gmail.com> wrote:
> There are multiple ways to secure data:

--

Thomas Mueller

unread,
Jan 8, 2010, 9:57:18 AM1/8/10
to h2-da...@googlegroups.com
Hi,

I didn't test recently with 180000 tables, but I think it's
problematic for H2, specially opening and closing the database, and
memory consumption. Currently H2 keeps all database meta data in
memory, and reads all tables when starting, and at least loops over
all tables when closing the database.

I guess you could create a test case quite easily.

Regards,
Thomas

Nitai @ Razuna

unread,
Jan 8, 2010, 5:24:23 PM1/8/10
to h2-da...@googlegroups.com
Hi,

I see. Then would 5 tables each with around 900000 records (and
growing) be much better in this case or, if you know, for Oracle,
MySQL, etc. ?

Kind Regards,
Nitai

> --
> You received this message because you are subscribed to the Google Groups "H2 Database" group.
> To post to this group, send email to h2-da...@googlegroups.com.
> To unsubscribe from this group, send email to h2-database...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
>
>
>
>

--

Tatu Saloranta

unread,
Jan 8, 2010, 7:12:05 PM1/8/10
to h2-da...@googlegroups.com
On Fri, Jan 8, 2010 at 2:24 PM, Nitai @ Razuna <ni...@razuna.com> wrote:
> Hi,
>
> I see. Then would 5 tables each with around 900000 records (and
> growing) be much better in this case or, if you know, for Oracle,
> MySQL, etc. ?
>

As per Thomas' answer, I would assume yes. Only table _metadata_ has
to reside in memory, not actual data (except for in-memory-only DBs).
Thus amount of obligatory memory usage is lower.

-+ Tatu +-

Reply all
Reply to author
Forward
0 new messages