For some reason or other, Habari just now started complaining about key constraint issues on the tags table during my most recent upgrade.
michaeltwofish and I hashed through this issue on IRC last night and came up with the following 2 SQL queries:
update tag2post, (select t1.id as from_id, t2.id as to_id from tags t1, tags t2 where t1.tag_slug = t2.tag_slug and t1.id > t2.id) t set tag2post.tag_id = t.to_id where t.from_id = tag2post.tag_id;
delete tags from tags left join tag2post on tag2post.tag_id = tags.id where isnull(tag2post.tag_id);
The first query updates all posts to use only one of the two duplicate tags. The second query removes the unused tags from the tags table.
Be sure to account for the table prefix if you apply this fix.
I'm not sure whether this is required for all installs, or even what changed that caused this to happen, but at least three people in IRC last night were reporting this issue, so perhaps we should consider including it as part of the install process?
Also, for some reason, there is a single unique key on the tags table and it's on the tag_text field. This makes no sense. It seems to me that this index should be on the tag_slug table. Can anyone explain why things are the way they are, or should we update things? ("update things" can also mean "convert all of this tag stuff into a taxonomy" if anyone feels up to it.)
Scott Merrill wrote: > Owen said: >> Can anyone explain why things are the way they are, or should we ... >> convert all of this tag stuff into a taxonomy
> What are the long-term benefits of using a full taxonomy system?
The long-term benefits of a full taxonomy system are manyfold.
First, we'd be able to re-implement and provide API access (which we don't currently have) to the current "tags" vocabulary.
Second, we'd be able to implement hierarchical categories separately from tags by enabling parent-child relationships between vocabulary terms.
Third, we'd be able to associate meaningful metadata to posts outside of generic tagging. So in the cases where we'd tried to do things like add tags of "displayon:homepage" to have certain posts appear only on the homepage, we could create a vocabulary that was used for that purpose and had specific values that a user could choose from.
> What are the implementation challenges for migrating Habari to use > tags this late in the game?
I think it will not be too terrible.
We'll need to consider a taxonomy schema that works - a set of tables to hold the taxonomy data. I suggest that we drop the tags-related tables altogether as part of the conversion (after converting them), so as to make a clean break from "tags" altogether. If things are done properly, we should still be able to access all of the tag data via the Post object, using an installed-by-default "tags" vocabulary.
As for plugins that do tag-cloud things, we'll need to do some things: First, provide a taxonomy API that will allow access to "tags" and other vocabularies so that accessing the tables directly is not part of the standard procedure. Second, identify the plugins that will be affected and help convert them to use the new API. These things should really happen side-by-side.
> Do you have some body of code already prepared to make this happen? > If not, how much work do you expect it will take from the community?
I do not personally have code to make this happen, but I suspect that a dedicated weekend could get the basic API functional enough to replace the tag system we have now, especially if there were people around to code and test.
update habari__tag2post, (select t1.id as from_id, t2.id as to_id from habari__tags t1, habari__tags t2 where t1.tag_slug = t2.tag_slug and t1.id > t2.id) t set habari__tag2post.tag_id = t.to_id where t.from_id = habari__tag2post.tag_id;
gives me #1062 - Duplicate entry '3-78' for key 1 ?
> -----Original Message----- > From: habari-dev@googlegroups.com [mailto:habari-dev@googlegroups.com] > On Behalf Of Owen Winkler > Sent: 2. mai 2008 01:24 > To: habari-dev@googlegroups.com > Subject: [habari-dev] Key problems on tags table in latest svn
> For some reason or other, Habari just now started complaining about key > constraint issues on the tags table during my most recent upgrade.
> michaeltwofish and I hashed through this issue on IRC last night and > came up with the following 2 SQL queries:
> update tag2post, (select t1.id as from_id, t2.id as to_id from tags t1, > tags t2 where t1.tag_slug = t2.tag_slug and t1.id > t2.id) t set > tag2post.tag_id = t.to_id where t.from_id = tag2post.tag_id;
> delete tags from tags left join tag2post on tag2post.tag_id = tags.id > where isnull(tag2post.tag_id);
> The first query updates all posts to use only one of the two duplicate > tags. The second query removes the unused tags from the tags table.
> Be sure to account for the table prefix if you apply this fix.
> I'm not sure whether this is required for all installs, or even what > changed that caused this to happen, but at least three people in IRC > last night were reporting this issue, so perhaps we should consider > including it as part of the install process?
> Also, for some reason, there is a single unique key on the tags table > and it's on the tag_text field. This makes no sense. It seems to me > that this index should be on the tag_slug table. Can anyone explain > why > things are the way they are, or should we update things? ("update > things" can also mean "convert all of this tag stuff into a taxonomy" > if > anyone feels up to it.)
I'm glad this has come up. It's something I've been thinking about
since I first starting looking at Habari (which, granted, was about a
week ago).
Specifically, I think the current taxonomy API in WordPress is pretty
good. I've spent a good amount of time with it, so I feel like I have
a good idea of its strengths but also where it could use some
improvement. I've been thinking about the necessary schema, and I
could probably post a draft one tonight or at least this weekend
sometime. I'd also be willing to advise on a specification for the API
but I don't have any time to do any actual coding. I could probably be
talked into testing pretty easily.
On May 2, 6:58 am, Rich Bowen <rbo...@rcbowen.com> wrote:
> Specifically, I think the current taxonomy API in WordPress is pretty > good. I've spent a good amount of time with it, so I feel like I have > a good idea of its strengths but also where it could use some > improvement. I've been thinking about the necessary schema, and I > could probably post a draft one tonight or at least this weekend > sometime. I'd also be willing to advise on a specification for the API > but I don't have any time to do any actual coding. I could probably be > talked into testing pretty easily.
I'm interested in hearing your ideas.
Idly, I was thinking about this today, and here are my embryonic schema thoughts. Note that this is not a final concept and is open to discussion and revision, even though I'm presenting it as "here's what we'll do..."
We'd have three tables for taxonomy support: Vocabularies, Terms, and TermObject
Vobabularies would be a table naming the vocabularies that are available on the system and associating a numeric id to them. It would include a type parameter that would allow Habari to distinguish "free" vocabularies (one where new terms can be added at runtime), from "set" vocabularies (one where terms are predefined). It would include a size parameter that would specify the maximum number of terms that could be associated to a single object, which could be null for unlimited. An additional optional field would allow a Vocabulary to be applied to specific things, like posts, entries, or comments.
The Terms table would associate a term, a term's slug, any parent term it might have to a vocabulary. A term (and its slug) might appear more than once in this table, but only once per vocabulary.
The TermsObjects table would associate terms to objects numerically. The table would consist of two integer values corresponding to the term id and the object id.
Using this system, we could reimplement the existing tag system by creating a new vocabulary called "tags". The "tags" vocabulary would be a "free" vocabulary with no limit that applies to posts.
The existing tags table would be migrated to the terms table, with the only changes being to add a null value for the additional parent field, and the id of the "tags" vocabulary.
The existing tags2post table would be migrated to the TermsObjects table, pretty much intact. We'd still need to write some kind of mapping system since we shouldn't assume that all of the id's will be the same.
Some might prefer an additional separation of the Terms table, where its split into a VocabulariesTerms table that holds two integers (the vocabulary id, and a term id), and a Terms table that describes the terms alone. This would prevent duplicate term values in the Terms table, but add an additional level of complexity to querying for posts by vocabulary terms. It also may be desirable to keep a single Terms table for the purpose of allowing different slugs/terms for different vocabularies. Not sure on this point. I'm interested in the opinion of our resident database gurus on this topic.
It might also be worthwhile to add a flag to the vocabularies rows that defines a vocabulary as directly user-editable. For example, if a vocabulary is marked this way, then it could appear (in the case that it applies to posts) in the additional fields area of the post editing page, and automatically display appropriate options (a dropdown in the case of a "set" type vocabulary, or a text box in the case of a "free" type vocabulary). If this is done, the "tags" vocabulary would simply be another of these user-editable types. For non-editable types, plugins would either provide an interface for taxonomy access, or manipulate the taxonomy data behind the scenes.
For those folks once again asking what all of this is for, I'll present a more concrete example.
Assume that you are creating a site with four distinct areas to which posts can be distributed. One might in the current system add specific tags to a post that make it appear in a single area by virtue of a call to Posts::get() and passing that special tag. But then those special tags "pollute" the tag pool, potentially showing up in unexpected places.
With the taxonomy system, you'd do virtually the same thing, except you'd define a vocabulary that represents the categorization of posts into one of the four areas. Call that vocabulary "category" and define it as one where each object in it can only be defined by one term among four that are preset. That is, you will be allowed only to assign one of four terms to any single post.
A call to Posts::get() that includes this specific taxonomy information will return posts that qualify for that category just as with the tag setup, but the categories will not pollute the tag system.
Also, you might define an additional vocabulary that could be used to query for pages that make up the menu for each area. Posts without a term defined in this vocabulary would not appear in the menu for that category of posts.
Applying the taxonomy system would yield many benefits such as this, which seem to provide features that people have been asking about.
On May 3, 9:11 pm, Owen Winkler <epit...@gmail.com> wrote:
> Idly, I was thinking about this today, and here are my embryonic schema
> thoughts. Note that this is not a final concept and is open to
> discussion and revision, even though I'm presenting it as "here's what
> we'll do..."
I think we independently arrived at similar positions. Here's my rough
schema, or at least a list of tables and column names:
Table: Vocabularies
* id
* name
* slug
* type - "free" or "set" as per Owen
* limit - maximum number of terms (NULL for unlimited)
* object_type - limit vocabulary to posts, comments, etc.
* active - boolean to allow user to deactivate vocabulary without
deleting/losing data
Table: terms
* id
* name
* slug
* description
* vocab_id
* parent - id of another term for hierarchical vocabularies
Table: term_info
* term_id
* key
* value
Table: term2object
* term_id
* object_id
I tried to include descriptions where the point of a column didn't
seem to be self-evident, but let me know if I need to give a better
explanation. It follows Owen's description pretty closely. The biggest
difference is the term_info table. (I think that's an appropriate name
based on the schema in the wiki. In WP, they'd call it term_meta.)
This table allows the database to capture additional information about
a term. I shall illustrate its usefulness with an example (one which
also shows why a taxonomy system is more useful than the current tags
setup).
A system like this can be used to organize a series of posts on a
topic (i.e. a multi-part post). You could do it with a tag or
category, but then you have to worry about where that tag is going to
show up. In this case, "series" is the vocabulary and the term might
be "civil war battles." We also have a plugin that displays a blurb at
the beginning of any post in a series term. There's a default blurb,
but we want to have a different one for this series on civil war
battles. So we just create an entry in term_info that specifies the
different template.
Another example might be if you had a vocabulary called "people" and
you used it to tag your friends in a photo gallery. You could then
associate additional information about each person (e.g. email
address, birthday) with that person's term.
I also think it's important for the end-user to be able to define new
vocabularies so that they can put it to any use they can envision. My
philosophy is that the more extensible your platform is, the better.
Ok, I hope this makes sense. It's been a long day. I look forward to
getting feedback from the group.
> I think we independently arrived at similar positions. Here's my rough > schema, or at least a list of tables and column names:
Obviously, this looks good to me, too. ;)
To add to your initial take on the terminfo table, we'll need an additional `type` field to differentiate serialized data from regular data. A strange type of injection we're trying to avoid would involve a plugin that stores user-entered values into info data, and a user injects a serialized value. If we just checked to see if unserialize() fails, then we run the risk of unserializing injected data, which is bad. That's why we have that extra field. FYI.
As an aside to all of this: I'm looking now curiously at the table names in the existing database, wondering why some have underscores and some not. But I am opposed to table names with "2" in them used instead of the word "to". I'd much prefer just the names of the two things (in their respective plurality, depending on whether the relationship was one-to-one, one-to-many, or many-to-many) that are being linked, ala "groups_permissions". Call it a pet peeve.
> I also think it's important for the end-user to be able to define new > vocabularies so that they can put it to any use they can envision. My > philosophy is that the more extensible your platform is, the better.
I'm not opposed to this happening in the future. Right away though, I'd simply like to build a functional schema and API. We can consider how to build the UI for adding end-user custom data to this as we continue.
On Sun, May 04, 2008 at 01:24:45AM -0400, Owen Winkler wrote: > As an aside to all of this: I'm looking now curiously at the table > names in the existing database, wondering why some have underscores and > some not. But I am opposed to table names with "2" in them used instead > of the word "to". I'd much prefer just the names of the two things (in > their respective plurality, depending on whether the relationship was > one-to-one, one-to-many, or many-to-many) that are being linked, ala > "groups_permissions". Call it a pet peeve.
>> I also think it's important for the end-user to be able to define new >> vocabularies so that they can put it to any use they can envision. My >> philosophy is that the more extensible your platform is, the better.
> I'm not opposed to this happening in the future. Right away though, > I'd > simply like to build a functional schema and API. We can consider how > to build the UI for adding end-user custom data to this as we > continue.
Yes, flexible is good, but the manner in which we expose this to the end-user (as opposed to the plugin developer) can either be useful, or monstrously confusing. Hence my earlier remark about Drupal. It's ok now, but in the earlier days, the terminology used in the UI was so completely baffling that it was unusable.
So, let's make sure that things are robust and well-documented before we start telling the end-user to do it themselves.
-- "That's what being alive IS ... It's being badly prepared for everything! Because you only get one chance ... You only get one chance and then you die and they don't let you go round again after you've got the hang of it!" _The Bromeliad Trilogy_, Terry Pratchett