There are several methods of storing string in database.
1. Store them in the same table (as replace)
2. Store them in separate table (and lookup it on need).
I tried to measure the speed. The script will appear soon in my master
branch (or benchmarks branch).
Sqlite in memory (Celeron M 1.5 GHz 2 GiB of RAM):
Export
user system total real
Original 0.0.3: 0.270000 0.010000 0.280000 ( 0.418400)
With string inside: 0.270000 0.020000 0.290000 ( 0.420386)
With separate strings: 0.440000 0.020000 0.460000 ( 0.680190)
Searching
user system total real
Original 0.0.3: 3.860000 0.160000 4.020000 ( 5.926988)
With string inside: 3.630000 0.130000 3.760000 ( 5.419317)
With separate strings: 3.070000 0.120000 3.190000 ( 3.960245)
Import
user system total real
With string inside: 0.360000 0.020000 0.380000 ( 0.478908)
With separate strings: 1.790000 0.080000 1.870000 ( 2.463469)
It is a sample - however those fact seems to be true to this
configuration (based on more then 10 runs):
- Searching with strings in table tends to be less costly then in
orginal.
- Searching with the separate table is on the same level as original
- Importing with the string in table is by factor faster then in
separate table
I propose to store the strings in table. What do you think?
Regards
PS.
Run on Celeron M 1.5 GHz 2.0 GiB connecting to PostgreSQL 8.0 on P3 512
MHz 256 MiB:
Export
user system total real
Original 0.0.3: 0.280000 0.050000 0.330000 ( 2.649211)
With string inside: 0.270000 0.050000 0.320000 ( 2.681522)
With separate strings: 0.540000 0.080000 0.620000 ( 8.102009)
Searching
user system total real
Original 0.0.3: 2.130000 0.220000 2.350000 ( 10.467809)
With string inside: 2.270000 0.230000 2.500000 ( 10.887029)
With separate strings: 2.220000 0.220000 2.440000 ( 10.809432)
Import
user system total real
With string inside: 0.260000 0.010000 0.270000 ( 0.479773)
With separate strings: 1.340000 0.080000 1.420000 ( 5.306536)