categorization

0 views
Skip to first unread message

Eric Moritz

unread,
Feb 6, 2009, 1:13:51 PM2/6/09
to Penny Press developers
I wanted to start working on the categorization stuff, here's what I
purpose, let's discuss it.

1. All content models will have categories
2. Categories are a tool of classification not as site structure
3. Categories are hierarchical.
3.1 Django-mptt will be used to accomplish this
4. Content models will have a manager that will have a
by_category_list that returns a filtered queryset
4.1 by_category_list will have two parameters: union and
intersection
4.1.1 union: by_category_list(union=category_list) will select
models that contain any of the categories in category_list
4.1.2 intersection: by_category_list(intersection=category_list)
will select models that have all the categories in
category_ilst
5. The admin interface for category association will act like
delicious' tag interface
5.1 When a user uses, "news/local/sports/curling" and curling is not
an existing category, curling will be created and become a child of
sports. Obviously if sports doesn't exist, it will create that
category before curling, and up the tree.


Anything else anyone wants to add?

Eric Moritz

unread,
Feb 6, 2009, 1:42:40 PM2/6/09
to Penny Press developers
A union of categories will select models that have any of the
categories:

Consider the following:

story1.category_list = ['/news/local/sports/high-school', '/events/
sports/high-school/games', '/places/high-schools']

story1 is a story that is sports high school related. It also talks
about an game at the high school and it talks about the high school so
obviously it is in the places/high-school category.

A union would select the school if category list was this:

by_category_list(union=['/news/local/sports/high-school', '/news/local/
high-school/'])

Because one of the categories was used in by_category_list

An intersection will select the object if all the categories used in
by_category_list is used by the model:

This would select that story:

by_category_list(intersection=['/news/local/sports/high-school/', '/
places/high-schools'])

This would select that story as well.

by_category_list(intersection=['/news/local/sports/', '/places/high-
schools'])

Eric Moritz

unread,
Feb 6, 2009, 2:08:49 PM2/6/09
to Penny Press developers
We may want to think about how the preferred category structure should
work for Sections


For the section:
/local/sports/high-schools/

Should the categories be an intersection like this:

/places/high-schools/
/local/
/sports/

That way when we want a section pertaining to high-schools we simply
use:

/places/high-schools/

If we want a section of stories that are local high schools we do a
intersection of:

/places/high-schools/
/local/



The alternative is using a union:

The high school sports section would be this:
/local/sports/high-school/


For a union to work you'd need one category to match all the criteria:

/places/local/high-school/sports/

For a section pertaining to just high-schools we'd need a new category

/places/high-schools/

If we want a section of stories that are local high schools we'd need
another new category

/places/local/high-schools/
or
/places/high-schools/local/

In my opinion intersections for sections seems to make more sense.

What do you all think?

Eric.

Michael

unread,
Feb 6, 2009, 2:15:59 PM2/6/09
to django-pennypr...@googlegroups.com
Sounds too much like tagging and could get really ridiculous. I don't have a solution, just saying that this is looking more and more like tagging.  

Eric Moritz

unread,
Feb 6, 2009, 2:28:21 PM2/6/09
to Penny Press developers
Yeah I was just thinking that. I think using unions ends up being
unmaintainable from the story side.

Say we had a section:

/sports/local/high-school/

and have been mapping stories to the category:

/sports/local/high-school/

Then we determine we need a high-schools section and want to target
all stories that pertain to high schools. If we didn't have the
foresight to create a /high-school/ category and map the stories to it
prior to the creation of the /high-school/ section, all the stories
categorized as /sports/local/high-school/ will have to have the new /
high-school/ category added to them so they show up in the new /high-
school/ section.

Either that or we map any category with high-school in it to the /high-
school/ section so that the section's categories will have to look
like:
/sports/local/high-school/
/high-school/graduation/
/events/high-school/
/crime/high-school/
/etc/

When a new high-school category is added we have to remember to add
that category to all the sections that reference high-schools.

With a union the new high-school section would just need to have the
category "/high-school/"

A crime story would be mapped as:
/crime/
/high-school/

A high school event would be mapped as
/events/
/high-school/



On Feb 6, 2:15 pm, Michael <newmani...@gmail.com> wrote:

Eric Moritz

unread,
Feb 6, 2009, 2:30:20 PM2/6/09
to Penny Press developers
I think we had this conversation before :) You know I think tag
unions are better than hierarchal categories :)

Eric Moritz

unread,
Feb 6, 2009, 2:32:25 PM2/6/09
to Penny Press developers
@michael What's your beef with tagging again? I think if we don't use
a free form tag field in the story admin and use a select box for the
tags much of the problems with tagging will go away.

Michael

unread,
Feb 6, 2009, 2:42:39 PM2/6/09
to django-pennypr...@googlegroups.com
On Fri, Feb 6, 2009 at 2:30 PM, Eric Moritz <eric....@gmail.com> wrote:

I think we had this conversation before :)  You know I think tag
unions are better than hierarchal categories :)

I was feeling a strange sense of deja vu.  

My major thing is that I think hierarchal data is easier to keep track of. People who read papers tend to focus on a single section. We need to maintain those sections on the internet side. I am less ambitious about keeping everything in folder type structures then I was when we originally had this conversation. 

I think your current ideas can hold that idea, so I am not to concerned about that. I am nervous, however, when any given story might end up in its own category. I believe there shouldn't be a category until there are multiple stories/an interest from the viewing public/advertiser for that category.

Also database calls always make me nervous. MPTT is intensive and so is django-tagging. Whatever we decide needs to be optimized, streamlined and keep database calls down regardless of caching. 

I am still trying to think of the perfect idea and to be honest, I can't lay it out in my mind. Every time I implement a category system I spend a ton of time on it and it doesn't quite work like I want it to. So I am open to see how you lay it out as long as the three items above are considered. The first 2 seem to be up to the editorial staff and the last one can be done post design. So go for it. Keep us up to date and thanks, I look forward to seeing it, Mn

Glenn Franxman

unread,
Feb 6, 2009, 5:12:30 PM2/6/09
to django-pennypr...@googlegroups.com
Rough notes from my Tagonomy project, but you'll get the idea:

Taxonomy-tagging hybrid


AS our taxonomies get more varied and more robust, they become way too
big for traditional drop downs or even just loading into a page, but
tagging results is too much varierty based on misspellings and
inconsistent use of tags. One solution might be a hybrid. In this
solution, the user interface behaves largely like delicious tagging,
but the tags have structure to them that aids in disambiguating their
meaning.

For example, consider the following definition of a tag:

Tag:
name
parent
implies
metadata


With this structure, tags can be nested according to the existing
taxonomies with which we normally deal, but the tags can also imply
other tags. For instance, we often get into difficulties trying to
mix subject matter tags in the same hierarchy as geographic tags. We
may want to tag a story as news/local/bearden and
sports/football/highschool/bearden_high, but this structures mix
taxonomies in a way that make it difficult to manage. Should it be
sports/football/college/myteam or college/football/myteam or what?
And a lot of times, you want to make sure that certain items get
multiple tags in a consitent manner.

Ideally, we'd tag the story as bearden_high which would imply
geo/tn/east/knox/bearden and education/school/highschool and
scope/city. I'm throwing the scope taxonomy out there as a
replacement for the 'local' tag which generally means that the story
is of use to a small community that we serve, but local doesn't do a
very good job of defining the scale of that community.
scope/
city/
state/
national/

Also note that I'm rooting the taxonomies under abstract terms like
scope, subject, topic, person, geo. We can have roots for yahoo, ap,
etc and the implication mechanism can be used for mapping between them
when needed. For instance, classifieds/automotive might imply
yahoo/transportation.
ht

We will need to keep track of the source of each tag also -- they can
be assigned, implied, mined or discovered.
Assigned tags are tags that have been added by staff. These are
the official tags.
Implied tags are tags that have been added by the system based on
other tags.
Mined tags are tags that are mechanically determined through data
mining the content. A better term might be 'derived'
Discovered tags are tags that have been given by the community.


news/
sports/
goverment/
entertainment/
technology/
community/
person/
place/
event/
organization/
school/
church/
charity/
business/
ad/
banner/
classifieds/
employment/
automotive/
realestate/
pets/
general/



The metadata of a tag is used to provide information that lies outside
of the tagonomy like ties to actual content items. Perhaps the
org/business/scripps tag might have metadata along the lines of
places:<place_id>,stocks:SSP, etc


When adding these tags, we'd not want to drill down all of the time.
instead, you'd want to type bare words and if they were unique, the
system should expand them for you. If you were a power user you might
use some syntax to help like double slashes to mean 'at any depth'.
Ie. If Ford is a person, a business, a car make and a county any you
wanted the county, you might type place//ford



root/
/inprint/frontpage
/inprint/lifestyle
/site-nav/lifestye
/people/
/place/
/subjectmatter/
/event/
Reply all
Reply to author
Forward
0 new messages