Directory names

13 views
Skip to first unread message

Kevin Carlson

unread,
Jan 15, 2015, 6:46:47 PM1/15/15
to tm...@googlegroups.com
Hi,

I just tried tmsu over the last few days, and I am quite impressed.  I do have a few concerns I thought may be worth mentioning and a suggestion.

Schema:
  Your schema for 'File' has 'text' field named 'directory'.  This means the directory component of any file is in no way normalized in the database.  As an example, lets say I have a folder called /raid/sharedMedia/pictures/Family/vacation/2014/ containing 300 pictures.  This results in:

Data waste:
- Those 48 characters are now repeated 300 times, leading to about 14,352 characters in the database.  Were I to have 100,000 files in there, the waste would be 4,799,952.  If it's 3 bytes per UTF-8 character, thats about 14.4 Mb.  This may not be significant with todays storage, but it does add up.

Performance:
- This would also over complicate tagging individual directories.  It requires updating 300 entries, rather than a single one for a normalized database.

- A similar over complication occurs with a 'move' operation on a directory.  Renaming the folder I mentioned requires an update of all 300 entries.  (done with your 'tmsu repair' command)

I suggest a new schema:  (Forget the column types, you get the idea)

| id           | int(10) unsigned    | NO   | PRI | NULL              | auto_increment              |
| parent       | int(10) unsigned    | NO   | MUL | NULL              |                             |
| directory    | varchar(100)        | YES  |     | NULL              |                             |
| lft          | int(10) unsigned    | NO   | MUL | NULL              |                             |
| rgt          | int(10) unsigned    | NO   |     | NULL              |

Directory access queries are simplified using Modified Preorder Tree Traversal, as explained here: http://www.sitepoint.com/hierarchical-data-database-2/

I wrote something similar using PHP for tagging pictures on the filesystem for an online photo album.  It worked well and may be beneficial here.  This may also tie in with your Issue#15: 'Root paths relative to database path rather than root directory'.  Every database would ultimately have a root id/parent (id=0, no parent), which can be any mount point in a filesystem.  This lends itself well to easy relocation, by changing a single top level directory / mount-point, with i=0.

Nice work by the way!

- Kevin

Paul Ruane

unread,
Jan 15, 2015, 6:56:11 PM1/15/15
to tm...@googlegroups.com

Hi,

You're right, of course. Paths are not normalised, they're stored multiple times, once for each file. This is reduced with the change to root them to the database patent but not eliminated. It's not something I had ever thought about before. I'll take a look, thanks.

Paul

--
You received this message because you are subscribed to the Google Groups "tmsu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tmsu+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages