I'd like to start a discussion about tagging web links in applications
that support OpenMeta because I think it'd be in the best interests of
users if different applications would be compatible with each other in
this respect as well.
I just released version 0.9 of Tagger where I added this feature, and
the way I implemented it was that for every tagged URL I create
a .webloc file in "~/Library/Metadata/Tagger/Web Links/" and add the
tags to that. I use the title of the tagged page as the name of the
file (replacing forward slashes with hyphens) because I think it'll be
intuitive for the users to see the page titles (instead of, say, their
URLs) when they do Spotlight searches. Whenever two different pages
with the same title are tagged, I append " (URL)" to the file name,
replacing forward slashes with colons in the URL.
I noticed that Gravity Apps' Tags has this feature, but it does it a
little differently: it uses "~/Library/Caches/Metadata/Tags/
Bookmarks/" for the .webloc storage, replaces also colons and
backslashes with hyphens in the page title when using it as
the .webloc filename, and deals with page title collisions by
appending incrementing numbers to the filenames.
If different apps were to standardize on this feature, here are some
of the issues I think are important to think about:
Location for storing the .webloc files
-----------------------------------------------------------
I propose some path under "~/Library/Metadata/" because this path is
both indexed by Spotlight and backed up by Time Machine by default. "~/
Library/Caches/Metadata" is indexed by Spotlight but not backed up by
Time Machine.
How does "~/Library/Metadata/OpenMeta/Web Links/" sound?
Naming of the .webloc files
-----------------------------------------------------------
I think using page titles for naming the files is intuitive and clear.
The only possible issue that I can think of is that tagging
applications would always need to have the title as well, and not just
the URL, but I don't see this as a big problem.
The second thing is how to 'clean up' the page title so that it'd be
suitable for use as a file name. What needs to be taken into account
here are the restrictions posed by different file systems. The most
common one is of course HFS+ but I suppose one can also use a
different file system on the volume where their home folder is
located. An optimal solution would probably be one where the set of
forbidden characters for all possible Mac filesystems (or at least the
most common ones) would be replaced with other non-forbidden
characters and the length of the filename would be restricted to the
maximum filename length of the filesystem in this set for which the
limit is the lowest. Taking restrictions for filesystems common on
other platforms (e.g. FAT32, NTFS, ext2/3/4) into account would be
nice as well (although not imperative) since users might want to copy
the .weblinks onto Windows or Linux systems and this could make it a
bit easier.
Then there is the issue of page title collisions: if the user wants to
tag two different web links that have the same title, there needs to
be a predictable system for making the filenames unique. I can think
of three different options here:
(1) Appending incrementing numbers to the filenames (this is what Tags
does). This is a simple solution for making the filenames unique and
doesn't make them 'ugly' (in my opinion). The only problem I can think
of is that it won't be obvious to the user which link points to which
page if you have more than one with the same name, with the only
difference of an additional number at the end.
(2) Appending the page URLs to the filenames (this is what Tagger 0.9
does). This is an even simpler (?) solution for making the filenames
unique but it kind of clutters them up (i.e. makes them 'ugly'). The
problem with #1 is solved here, though, since the URL is in the
filename. One problem with this is that it requires a lot of space for
additional characters in the filename (especially if the URL is long)
in order to guarantee uniqueness, as opposed to #1. This might become
a problem if the maximum filename length is low.
(3) Creating a folder tree that follows the structures of the URLs and
saving the .webloc files at the leaf folders in this tree (see below
for example). This allows one to keep using the page titles as
filenames without modifying them further and still avoid collisions,
but the problem with #1 still applies here and this one requires more
work (you'd need to make sure that empty parts of the folder tree are
removed whenever .webloc files are deleted). File system folder name
length limits (and path length limits? total file number limits?) will
probably cause problems here as well. I'm not rooting for this one at
the moment.
https/
www.google.com/dogs/cats/?q=dogs%20cats/Dogs and Cats.webloc
http/
hasseg.org/tagger/Tagger.webloc
ftp/
hasseg.org/tagger/Tagger.webloc
Handling of anchors in page URLs
-----------------------------------------------------------
In Tagger I decided to remove the 'anchor' part from all URLs before
doing anything else with them since I want to apply tags to full
pages, not parts thereof. I think this is a reasonable approach but I
wanted to mention this as well in case there are compelling arguments
to the contrary.
Cleaning up
-----------------------------------------------------------
This is not a big issue but Tags seems to delete the .webloc files
whenever they don't have any tags assigned to them anymore. I made
Tagger do the same. I think this is a good idea.
Thoughts?