Taxonomy management tools

3,109 views
Skip to first unread message

Rahel Anne Bailie

unread,
Nov 19, 2012, 5:25:53 AM11/19/12
to content...@googlegroups.com
I have a general question about taxonomy tools. Last I had to look on the market for a taxonomy tool, there were tools at either extreme (small, limited functionality and behemoth tools) but few in the middle, and the clients I worked with at the time just did theirs in Excel and built what they needed.

Now, I want to look at something robust that is open source or open standards. I don't want to go into a lot of detail on list, but also don't want to give the impression that I haven't thought about this any deeper - I have, believe me.

If anyone has time for a chat, I'm open to that, too. 

Rahel Bailie

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Contact: http://about.me/rahel.bailie 
Co-producer: Content Strategy Workshops
Co-author: Content Strategy for Decision Makers - in stores Dec 2012


Matt Moore

unread,
Nov 20, 2012, 5:01:55 PM11/20/12
to content...@googlegroups.com, content...@googlegroups.com
Rahel,

I would suggest posting this to the Taxocop email list - you'll be talking to people that specialize in this kind of thing.

A few points:
- You need to be clear on the functionality you want - are you just looking for a simple standalone vocabulary management tool or do you need something that plugs into content management system or even something that does automated classification?
- If you are just using one CMS then some of them (e.g. Sharepoint, Drupal) now have OKish taxonomy capabilities of their own.
- When you say "open source or open standards" that's a bit vague. Protege is open source ontology manager (but ain't for beginners). Many taxonomy tools will export to SKOS - which is a W3C standard.
- In the last few years, this space has moved from having a small number of players to a much larger number of players of different sorts. There are the taxonomy/ontology management tools but as mentioned, CMS vendors offer capability in this space. So do Search engines. There are entity extraction engines and textual analysis tools. There have even been some attempts to create enterprise strength folksonomy tools (with decidedly mixed results). All of these tools tend to have overlapping functionalities and many are not even aware of the existence of the others.

I only dabble in this space these days so I struggle to get my head round the details. Heather Hedden did a neat overview of the space a couple of years ago when she was at Earley & Dan Keldsen did a Findability report (I think for the AIIM) but you've probably seen those & they'd both be way out of date now.

Cheers,

Matt Moore
--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/contentstrategy?hl=en.

Suz Bednarz - Kish

unread,
Nov 20, 2012, 5:45:04 PM11/20/12
to content...@googlegroups.com
Rahel, I would love to be part of this discussion if even just as an observer. Taxonomy is something my organization has struggled with in a major way (I am a content strategist and site search product lead).

So, if you start an email thread, go to another list, please feel free to add my email to the list.

bluestokking @ gmail. com

Cheers,

Suz Bednarz-Kish

Shop
: http://www.chloeandisabel.com/boutique/suzbednarz
Follow: https://twitter.com/SuzIsChloe
Facebook: https://www.facebook.com/ChloeIsabelBySuz


=============
I've learned that people will forget what you said, people will forget what you did, but people will never forget how you made them feel. - Maya Angelou


Rahel Anne Bailie

unread,
Nov 20, 2012, 6:07:55 PM11/20/12
to content...@googlegroups.com
Thanks, Matt! I am talking with a taxonomist, as well, but I thought I would ask the community, too.

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Contact: http://about.me/rahel.bailie 
Co-producer: Content Strategy Workshops
Co-author: Content Strategy for Decision Makers - in stores Dec 2012




Susan T. Rector

unread,
Nov 20, 2012, 7:11:07 PM11/20/12
to content...@googlegroups.com, content...@googlegroups.com
I would love to be part of the discussion too, I'm currently planning to use sharepoint, as that's our cms; but am curious about other tools.
Thx!
Susan.tea...@ucdenver.edu

Sent from my iPhone

Charlie Morris

unread,
Nov 21, 2012, 8:24:29 AM11/21/12
to content...@googlegroups.com
This probably is a bit too much tool, but Protege is an open source project that allows you to create ontologies.  Might be of interest.

-Charlie

Paola Roccuzzo

unread,
Nov 21, 2012, 8:46:17 AM11/21/12
to content...@googlegroups.com
Another vote for Protege--I would also discuss its adoption with Development first (they could give valuable insights to how to use it properly and get the most out of it).

Paola

Zahoor Hussain

unread,
Nov 21, 2012, 3:47:10 AM11/21/12
to content...@googlegroups.com
Hi All

I have recently completed a tools evaluation of the major players for an enterprise wide taxonomy management tool. It us becoming a crowded and fast changing market. 

We looked at a couple of keys areas and how the tools matched requirements
- what standards did the tools use to store, input and output vocabs - the key one is the ability to interface with Excel! :)
- governance - how can you manage and version taxonomies?
- integration - what Apis are available to integrate with existing systems - there will be many!
- visualisation - how are taxonomies presented
- reporting - who did what when 
- training - what level of training was needed
- usability - could taxonomists actually perform regular tasks without assistance or help? This included importing existing taxonomies in SKOS. Looking at relating taxonomies, localisation, disambiguation, synonyms etc.

Adopting a taxonomy tool is the first step of introducing semantic technology into an organisation!

Happy to share more on or off list. 

@Rahel I'll try to contact you if you still need to talk?

Thanks

Zahoor


--
Zahoor Hussain
Annotation Ltd
 
+447764613986
www.annotation.co.uk
www.twitter.com/izahoor
www.linkedin.com/in/zahoor
contenttype.wordpress.com
www.klout.com/izahoor
www.peerindex.net/izahoor
www.jboye.com/blogpost/10-online-professionals-to-watch-on-twitter-in-2010/

Matt Moore

unread,
Nov 21, 2012, 1:49:12 PM11/21/12
to content...@googlegroups.com
Zahoor - Sounds interesting. Which tools did you cover?

Zahoor Hussain

unread,
Nov 21, 2012, 4:09:45 PM11/21/12
to content...@googlegroups.com
Hi Matt

Two key decisions we made at early on that helped to narrow the field: 
1. Taxonomy management functions within CMS tools though more than adequate for simpler use case were ruled out as not being appropriate.
2. The need to separate taxonomy and instance metadata from the content. 

The list of tools we looked at, for the management of localised taxonomies, included in alphabetical order:

Mondeca ITM


Semaphore

Synaptica


Hope that helps. 

As readers here will be aware, tool selection is an important yet very easy part of adopting taxonomies. The effort required to standardize taxonomies across business units, and geographies should *not* be underestimated!

Caveat, this list is not an endorsement of any particular solution.

Thanks

Zahoor



On 21 November 2012 18:49, Matt Moore <innot...@gmail.com> wrote:
Zahoor - Sounds interesting. Which tools did you cover?
--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/contentstrategy?hl=en.

Rahel Anne Bailie

unread,
Nov 21, 2012, 4:33:18 PM11/21/12
to content...@googlegroups.com
A friend of mine is an information management consultant with a specialty in enterprise taxonomies, and she named many of the same tools. however one of the caveats she said is that choosing the right taxonomy tools for the job is more volatile and the need for thoroughly understanding the requirements is even more important than when choosing a CMS. So now I'm nervous.

Zahoor, I'd love to chat with you. I'll figure out the phone extension at my desk at the office and send you the number.

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Contact: http://about.me/rahel.bailie 
Co-producer: Content Strategy Workshops
Co-author: Content Strategy for Decision Makers - in stores Dec 2012




Malcolm Davison

unread,
Nov 26, 2012, 12:29:02 PM11/26/12
to content...@googlegroups.com

I idly typed the words ‘taxonomy’ and ‘history’ into Google. The third item listed was a page from UK’s Natural History Museum’s website. I was surprised to learn that: 

‘Taxonomy is arguably the world's oldest profession ...’

My mind wandered for a moment to contest this, but I won’t bore you with what I came up with! It then convinced me that the origins of taxonomy could be traced back to Swedish botanist Carl Linnaeus in the eighteenth century. 

Taxonomy and stuffed woolly mammoths seem to me to be a good fit. Are we not seeing the decline in the relevance of taxonomy?

Do you not associate taxonomy with long drop down menus - long lists of unnecessary visually distracting navigation down the side of the screen? Complex structures created to pigeonhole information - especially loved by government and local authority websites? 

Creating these is an impossibility. You do your best to set a structure up - and then catalogue a snippet of knowledge. For you it may be the most obvious and logical home but from someone else’s perspective and interests it will be totally illogical.

In these days of mobile device access we have no space on the screen to tell readers where in our vast web archives we have plucked their information. And do they need they know? Navigational structures need to 'evaporate' or be totally invisible. 

As long as the system knows where to get it - that’s all that matters.

It's great to ask Siri on the iPhone a question such as 'What's the population of Australia' and it comes up with the answer. Where did the answer come from - for the most part I couldn't care less. On other occasions it's useful to assess the information's credibility.

It’s a bit like the vast sea container ports where a crane can grab the container it needs - and only its computer knows where it must retrieve it from. Then there are the vast retailers that have warehouses that are automated to pluck our favourite paperback 90 feet in the air and a quarter of a mile away. I really wouldn’t fancy my chances trying to find it manually.

Just like this are we not, at long last, seeing these complex structures disappear? Are the Victorian shackles being removed?

Simple basic structures backed by intelligent searching and in-context linking have to be the way forward. But maybe flexible outline taxonomies that adapt for the user would be a helpful approach too.

Time for me to put my crystal ball away and learn from someone from the world's oldest profession ... ( a taxonomist of course).

Malcolm Davison

Tony Chung

unread,
Nov 26, 2012, 1:17:58 PM11/26/12
to content...@googlegroups.com
That's an interesting perspective, Malcom. Sure, taxonomy has long been equated with information architecture or information hierarchy, but its usage extends beyond the visible scope. In our recent project, our taxonomy specialist also led us to create a synonyms engine, to build a related search feature as a first step towards relevance. Google invests several millions of dollars in behind the scenes algorithms to ask you "you asked for 'X', but did you mean 'XX', 'XY', or 'Q'?" So when taxonomy is done well, and extrapolated beyond the visible, you get the new world order you speak of, where navigation structures are meaningless, and "Earl Grey--Hot" becomes the norm.

Just because the user doesn't see the complexities of the underlying information structure doesn't mean that there isn't one. It only means the creator spent a lot more time building the back end so that the front end didn't have to work so hard. Rahel was asking the group for ways to manage this level of information in order to produce the user-friendly result you're talking about. Fortunately, she is in a position where this type of thinking is encouraged.

It's difficult to convince small to medium sized enterprises of the benefits of even thinking about how to set up their information so that it can be found through related searches. They are more interested in telling the world what they do, using their arcane vocabulary, that they miss the point of what people are looking for.

-Tony

Suz Bednarz - Kish

unread,
Nov 26, 2012, 1:57:47 PM11/26/12
to content...@googlegroups.com
I would build on Tony's statement and offer this is not just limited to "... difficult to convince small to medium sized enterprises of the benefits of even thinking about how to set up their information so that it can be found through related searches."

I work in a very large enterprise and it is very difficult there as well.  Who owns it? Where does it belong? Who maintains? How do you get every aspect, every document, etc. involved (or do you?). Who funds?

I might even suggest it even more difficult in some large enterprises.


Cheers,

Suz Bednarz-Kish



=============
I've learned that people will forget what you said, people will forget what you did, but people will never forget how you made them feel. - Maya Angelou


Matt Moore

unread,
Nov 26, 2012, 2:37:09 PM11/26/12
to content...@googlegroups.com
Malcolm - You've made an argument for hiding the complexity of taxonomic structures from users - which is actually a valid one. You haven't argued that taxonomies are "extinct" because information still needs to be managed in the back end for this seamless presentation to occur.

Tony & Suz - Organisations of all sizes struggle with this stuff. Bigger organisations have more stuff to organize but they generally have more resources to do so (e.g. have a full-time taxonomist on board).

The organisations that do it best are the ones whose business depends on it. The BBC's use of ontology structures is world-class. Some online retailers do it well. Everybody else just muddles along.

Malcolm Davison

unread,
Nov 26, 2012, 3:37:21 PM11/26/12
to content...@googlegroups.com
Well Matt, in one scenario taxonomy could be made totally extinct.
This would employ metadata, intelligent search and 'expert' programmed
questioning to hone into the content area (although you might argue
this is a crude taxonomy). A structure is just not needed. The
computer would simply use its text database to locate the information.
But this probably would not work for large quantities of content.

But more practically a hybrid system could be built using a very basic
taxonomy supported by Wikipedia-like in-context embedded linking and
related links supplied by the web content writer. Metadata and
intelligent searching would also support.

A taxonomy is not the only way to access content.

Malcolm Davison
www.writingfortheweb.co.uk

Tony Chung

unread,
Nov 26, 2012, 3:41:54 PM11/26/12
to content...@googlegroups.com
I see taxonomy as the underlying structure, and metadata (with server side scripting) the vehicle.


--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to contentstrategy@googlegroups.com.
To unsubscribe from this group, send email to contentstrategy+unsubscribe@googlegroups.com.

Ruth Kaufman

unread,
Nov 26, 2012, 4:01:40 PM11/26/12
to content...@googlegroups.com
There does have to be some sort of knowledge representation behind the scenes. Relevance needs to be interpreted against the backdrop of *something*, and this could take a variety of forms, but can typically be represented by a graph of some sort. A taxonomy is typically a hierarchical graph, but there could be other types of relationships among terms/concepts/data elements, moving into the realm of semantics and ontologies. A lot of folks, rightly or wrongly, use the term taxonomy very generically to indicate structured knowledge.


To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.

Matt Moore

unread,
Nov 26, 2012, 4:29:19 PM11/26/12
to content...@googlegroups.com, content...@googlegroups.com
Malcolm,

1. If you're talking about "metadata" then you are often taking about taxonomic structures (controlled vocabularies, thesauri, hierarchies of terms). For example, recommended best practice with the "coverage" term in DCMI is to use a controlled vocabulary such as TGN.

2. Not sure exactly what you mean by "intelligent search". Sophisticated enterprise search engines use faceted search - which is dependent on, yip, taxonomic structures.

3. Expert programmed questioning? Again not sure what exactly you mean by that. If something's "programmed" then it generally has some machine-readable structure that it's working to.

I can come up with scenarios where I might not need anything more than a very simple structure but that's a long way from saying taxonomies are now unnecessary.

It's important to note that taxonomies are about more than just navigation & access. They are also about effectively managing the information resources that you have. They are sense-making tools.

You seem to have a particular idea in your head of what taxonomy work is that I'm not sure corresponds to the truth.

Cheers,

Matt 

Malcolm Davison

unread,
Nov 26, 2012, 5:20:02 PM11/26/12
to content...@googlegroups.com, malcolmdavison Davison
Thanks for this interesting debate.

It would worry me if all webbies believed that a taxonomy is the only
solution to access information in a web environment. It’s time to come
up with better more sophisticated ways to access information - because
hierarchical structures through necessity will become out-moded. Both
for screen presentational reasons and the increasing pressure of web
content volume, and over-complexity and fundamental weaknesses of a
single globally defined plan.

‘Expert systems’ have been around over 40 years - and are a structured
questioning system that hone into an answer for the user. These can
also be used to identify a broad area of interest. Then the local
navigation will be taken over and created by the content creator for
that theme or 'microsite' if you wish.

Metadata can record keywords without formalising them into lists of
accepted terms. The search engine can still extract keywords from the
text to supplement them.

The overall website would not need to have an overall formally-defined
taxonomy. Even locally the structure would be so straightforward that
you would be hard-pressed to call it a taxonomy.

I would totally separate content access and information management and
creation. That is a matter for the content creators to do in the way
that makes sense to them.

Not all search systems are based on taxonomies, they can
‘intelligently’ refine their approach through use. Just as some
software can adapt menu systems to prioritise features that the user
most often uses. Search engines can increase their effectiveness over
time by learning what people most often want to access.

I think Ruth you are right that techies read too much into the term
'taxonomy' which to most is widely accepted as the defined
hierarchical classification. In fact from the ancient Greek ‘taxis’
meaning arrangement and ‘nomia’ method.

Malcolm

Rahel Anne Bailie

unread,
Nov 26, 2012, 6:13:36 PM11/26/12
to content...@googlegroups.com
I was going to jump in and reply to Malcolm but while I was at the CS meetup, it seems many other people did!

I was defining taxonomy as structure+controlled vocabulary. That means a lot more than a hierarchical "site-map-like" structure (though menus are still important - if you read usability studies, people search to a certain point, then browse. And they also browse when they don't know which search term to use to get good results). It's the difference between searching for Keith Richards and the system knowing that Keith Richards should be associated with The Rolling Stones. It's the difference between searching for "plush toys" and a site delivering you the hottest Tickle Me Elmo as the top result on your favourite site. It's the difference between disambiguation between flag or flag or flag or flag (in four different contexts). It's all that and more.

Rachel Lovinger, the queen of taxonomy and metadata, discusses taxonomies at length all over the internet, and this is one of my favourite presentations: http://www.slideshare.net/rlovinger/the-rise-and-fall-of-topics -  

---

Rahel Anne Bailie, Content Strategy / Content Management / Content Design
Intentional Design Inc. - Content strategies for business impact 
Contact: http://about.me/rahel.bailie 
Co-producer: Content Strategy Workshops
Co-author: Content Strategy for Decision Makers - in stores Dec 2012




--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to contentstrategy@googlegroups.com.
To unsubscribe from this group, send email to contentstrategy+unsubscribe@googlegroups.com.

Matt Moore

unread,
Nov 26, 2012, 7:39:47 PM11/26/12
to content...@googlegroups.com
Malcolm,

I suppose my position is ultimately empirical rather than theoretical.
I have seen lots of organisations saying, “we don’t need to worry
about this taxonomy nonsense, we’ll just bung in a search engine” and
it’s come back to bite them. Our semantic technologies (for want of a
better term) are promising but still don’t cut it yet.

"It would worry me if all webbies believed that a taxonomy is the only
solution to access information in a web environment."

I don't know about all webbies (I haven't had a chance to ask them
all) but pretty everyone I rate who works with information
organisation structures (be they taxonomies, thesauri, controlled
vocabularies, ontologies or even folksonomies) does not believe that
these systems are the only issue when talking about accessibility for
users in digital environments.

What I would say is that many web designers are ignorant (even people
with IA backgrounds) as to how these tools work, when to use what and
how best to implement them to achieve outcomes (e.g. enhanced
usability).

Which means that they are missing a trick and users are suffering.
Just because the general public have an unsophisticated view of what
taxonomies are, that doesn’t mean information and content
professionals should.

"Both for screen presentational reasons and the increasing pressure of
web content volume, and over-complexity and fundamental weaknesses of
a single globally defined plan."

If you are saying that there are limitations with static, hierarchical
site plans then, yes, I would agree with that. But a static,
hierarchical site plan is only one form of information organisation
structure. There are many others forms (e.g. Ranganathan's colon
classification - which came out in 1933). The use of faceted
taxonomies within metadata allows you to dynamically serve content to
users ("OK, you need black leather men's shoes in this price range..
sure"). But there's a lot of back end work needed here.

"I would totally separate content access and information management
and creation. That is a matter for the content creators to do in the
way that makes sense to them."

If content creators, content managers and content users are different
groups then it makes sense that they may need different structures -
no question. But that wasn't the point that I was making. To clarify:
many people treat these structures as though they are purely about
navigation and I was making clear that there are other roles for
taxonomies to play.

"‘Expert systems’ have been around over 40 years - and are a
structured questioning system that hone into an answer for the user."

In some contexts expert systems work well - when you have a workflow
with clearly defined decision points and outcomes. If you do x then y
will happen. In the context of websites, you are often talking about
wizards - which are fine. Wizards can be very powerful. However most
wizards currently depend on pre-created structures which are often
based on some kind of taxonomy (e.g. you select a country from a list
to take you through to a particular version of a site - that list is
generally a controlled vocabulary).

"Metadata can record keywords without formalising them into lists of
accepted terms.”

So freetext keyword-based systems (a.k.a. folksonomies) have had a bit
of a mixed run. Current practice seems to be that they can supplement
more structured vocabularies but not wholly replace them. I think they
are very interesting but we’re still working out exactly how to use
them for optimal benefit.

“The search engine can still extract keywords from the text to supplement them."

That’s problem if a key term that’s needed to describe the document
doesn’t occur in the text. Our entity extraction tools are definitely
getting better and more affordable but we still cannot rely on them
totally.

“Not all search systems are based on taxonomies, they can
‘intelligently refine their approach through use. Just as some
software can adapt menu systems to prioritise features that the user
most often uses. Search engines can increase their effectiveness over
time by learning what people most often want to access.”

However many search systems are supplemented by faceted taxonomies (or
even fancy, schmancy ontologies). Probabilistic reasoning and machine
learning are simply not good enough yet.

You might say: "Yes but in the future they will be. The machines will
do everything that I want them to, as if by magic". In which case, I
can't argue with you.

If you’re based in the UK, then the ISKO UK group are a good place to
start with this domain – although some of their members are academics
and have the requisite academic approach to this.

Cheers,

Matt

Malcolm Davison

unread,
Nov 27, 2012, 6:48:34 AM11/27/12
to content...@googlegroups.com
Thanks for that interesting development of the discussion Matt, Rahel, Ruth and Tony.

Of course we mustn’t forget the Dewey Decimal System  -  the library classification system created by Melvil Dewey in 1876 which spawned other book indexing systems (Library of Congress Classification, etc) around the world.

How frustrating is this method of classification! Clearly a massive move forward from having no logical structure or having multiple competing or adhoc systems in operation.

You have only to visit the local library perhaps to read into aspects of ‘training’ you might look on one shelf for the subject you are delivering, another for use of voice and delivery techniques, another for course structuring and teaching methods, another for cross-cultural communication, elsewhere for the psychology of training, another part of the library for PowerPoint tricks, etc, etc. This suggest that adaptive content structuring might be beneficial - to bring all the relevant material together for one type of user. So that's classifying the user's needs as well as the content. These constantly change with the advances in the subject area.

A single formal taxonomy structure is an inadequate outdated method of content access.

“So freetext keyword-based systems (a.k.a. folksonomies) have had a bit of a mixed run.”

The problem of controlled vocabularies of search terms is that people don’t like to use them as they can get tediously long and rely on a working knowledge of subject terminology. A successful system is going to have to be low key and not involve user training or become onorous to the content writer. 

Equally it mustn’t rely on a single corporate expert trying to create structures in subject areas that they are unfamiliar with, nor depend on their availablilty to update and amend. But there does need to be a corporate expert to mastermind the central interlinking of departments. So the solution is, for the most part, going to need to be automated, or at least computer-assisted rather than wholly reliant on human intervention.

In this posting exchange you have all been happy discuss some of the various options and identified some problem areas - but what are your predictions for the way forward? 

Could the semantic web actually become a practical solution - or part solution, and will that come soon enough? Whatever it is, radical thinking is needed and more urgently than ever. In my view the pressures are such that we will see some of the greatest strides in the next two years in content access systems that we have ever seen - but just what will they be?

Malcolm Davison

Marcia Riefer Johnston

unread,
Nov 27, 2012, 5:05:14 PM11/27/12
to content...@googlegroups.com
Excellent examples, Rahel--yours and Rachel Lovinger's. 
To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.

Zahoor Hussain

unread,
Nov 27, 2012, 5:56:04 AM11/27/12
to content...@googlegroups.com
Hi All

Wow, what a healthy discussion.

I think moving to a future where we will need different skills and/or people to manage and discover content, as William Gibson put it "The future is already here — it's just not very evenly distributed."

This is very exciting area and is transforming publishing. Some of this thanks to the excellent work coming out form the BBC, to read more take a look @ http://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dynamic_semantic.html
I think that it has taken the beeb years and several iterations to get this right!

As rahel points out the systems needs to be able to differentiate between "things and strings", Google knowledge graph as an example of content discovery combined with search more @

Most organisations are not ready for anything as complex as this, however a key building block and a good place to start would be with taxonomies.

Zahoor


--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/contentstrategy?hl=en.

Malcolm Davison

unread,
Nov 28, 2012, 6:12:30 AM11/28/12
to content...@googlegroups.com

That’s fascinating Zahoor - so a solution has been found! And it’s been staring me in the face all the time. I look at the BBC website every day! The fundamental thinking behind the solution was simple too.

Now let’s see if I can paraphrase Jem Rayfield’s article and cut through the technobabble - techies please correct me if I’m wrong. So essentially the website has a basic taxonomy hierarchy - but the individual ‘departments or website areas’ draw in tailored selections of material from a central database of stories which have been tagged (with story type identifiers, etc).

So it's now possible to deliver customised batches of information for audience types under the control of the editors. There is a very basic fixed local taxonomy and content is constantly being updated. This means that user groups/types get navigation tailored by the editors and fresh content from a wide range of sources. Now people are far more likely to get a better user experience and find the material that they are interested in.

The article relates that the tagging for the story writers has been kept deliberately simple, matching a point I made in an earlier posting.

That covers nearly all the things I outlined in my wish list! And a complex global taxonomy has been eliminated.

The BBC is no doubt refining access for mobile users to eliminate complex navigation and have already introduced horizontal navigation to simplfy things.

In my view Zahoor, ‘things and strings’ is better managed by people. Although Rayfield suggests they are working towards a ‘fully dynamic semantic publishing (DSP) architecture’.

Editors are needed to select relevant content and the writers assist by tagging the items. This subject knowledge would still be needed even if an automated semantic web system was devised. It’s important to discern between Hollywood in Worcestershire, England and Hollywood in California or to distinguish real from copycat or fake. Automation may be one step too far for a news channel anxious to protect (or should we say in today’s context ‘recover’) its credibility.

Thanks Zahoor that was a really useful contribution.

Malcolm

Matt Moore

unread,
Nov 28, 2012, 6:58:57 AM11/28/12
to content...@googlegroups.com
Malcolm,

"Now let’s see if I can paraphrase Jem Rayfield’s article and cut through the technobabble - techies please correct me if I’m wrong. So essentially the website has a basic taxonomy hierarchy - but the individual ‘departments or website areas’ draw in tailored selections of material from a central database of stories which have been tagged (with story type identifiers, etc)."

I think you'll need to read & listen to the presentation below if you want to get a better sense of what was done. My understanding is that the BBC team did not build a "basic taxonomy hierarchy", they built a sporting event ontology. Which is a lot of work & pretty impressive.

This appears simple to the site user (& to an extent it's simple for contributor because it can infer all kinds of relationships from a single tag) but the back end is actually very complex.


Cheers,

Matt Moore

On Nov 28, 2012, at 10:12 PM, Malcolm Davison <in...@writingfortheweb.co.uk> wrote:


That’s fascinating Zahoor - so a solution has been found! And it’s been staring me in the face all the time. I look at the BBC website every day! The fundamental thinking behind the solution was simple too.



So it's now possible to deliver customised batches of information for audience types under the control of the editors. There is a very basic fixed local taxonomy and content is constantly being updated. This means that user groups/types get navigation tailored by the editors and fresh content from a wide range of sources. Now people are far more likely to get a better user experience and find the material that they are interested in.

The article relates that the tagging for the story writers has been kept deliberately simple, matching a point I made in an earlier posting.

That covers nearly all the things I outlined in my wish list! And a complex global taxonomy has been eliminated.

The BBC is no doubt refining access for mobile users to eliminate complex navigation and have already introduced horizontal navigation to simplfy things.

In my view Zahoor, ‘things and strings’ is better managed by people. Although Rayfield suggests they are working towards a ‘fully dynamic semantic publishing (DSP) architecture’.

Editors are needed to select relevant content and the writers assist by tagging the items. This subject knowledge would still be needed even if an automated semantic web system was devised. It’s important to discern between Hollywood in Worcestershire, England and Hollywood in California or to distinguish real from copycat or fake. Automation may be one step too far for a news channel anxious to protect (or should we say in today’s context ‘recover’) its credibility.

Thanks Zahoor that was a really useful contribution.

Malcolm

Malcolm Davison

unread,
Nov 28, 2012, 7:36:39 AM11/28/12
to content...@googlegroups.com
Thanks Matt that's a helpful link to support the Rayfield article and adds visual context.

I was trying to simplify the concept, perhaps over-simplify, so that it might be applied in other work environments.

They have created a sporting ontology you are right. I see the tagging tool their writers use is called 'Graffiti' this must add the simplicity that writers need and Rayfield refers to.

The pages are also published within a basic hierarchy (that's my observation and not in the article). So the overall taxonomy has been simplified and devolved.

It seems to me that it would not be too difficult to emulate the BBC's approach as most work applications for such a system would be a lot simpler. If you check out the rest of the BBC website you'll see there is a similar approach so I suspect the techniques have been rolled out beyond the sports department.

Malcolm


Matt Moore

unread,
Nov 28, 2012, 2:53:18 PM11/28/12
to content...@googlegroups.com
Malcolm,

1. You have to stop thinking of taxonomies as simply being page hierarchies in a website. A page hierarchy is one form of taxonomic structure. It's like saying "all pigs are mammals" therefore "all mammals are pigs".

2. Don't assume that all BBC websites are now ontology driven.

Matt Moore
+61 423 784 504
ma...@innotecture.com.au
Sent from my iPhone

Matt Moore

unread,
Nov 28, 2012, 3:01:39 PM11/28/12
to content...@googlegroups.com
Email got sent before completion:

3. Building & maintaining ontologies is not simple. That's why the sports event ontology was a big achievement. If it's well-designed then much of the complexity is hidden from users.

Cheers,

Matt

Zahoor Hussain

unread,
Nov 28, 2012, 6:40:27 PM11/28/12
to content...@googlegroups.com
Hi All

@Malcolm
I would agree that content curation is still required by subject matter experts or editors to disambiguate automated metadata enrichment of content. I would say though the BBC sport system is fairly mature as it is domain specific.Scaling up manual metadata tagging is just not commercially possible for most business, and most people hate tagging!

I would suggest that taxonomies and ontologies are at opposite ends of the same scale. Most ontologies need not be complex, and in the world of agile taxonomy development you would start with simple vocabularies that everyone can agree on and increase breadth and/or depth through iterations e.g. we have a market ontology which has  (Markets, Regions, Competitors, Products, Audiences  etc.) and tagging content using just these vocabularies across business/regions would significantly increase findability of content. 

@Matt
Agree, taxonomies can be much more than hierarchical vocabularies their uses could include:
Management of descriptive metadata 
Aiding in content discovery
Enable narrower and broader term matches
Support synonyms, language variants
Assist with disambiguation (see Rahels flag example)
Basis of navigation
Enrichment of content
Used to localise content for different geographies
Basis of ontologies 
Assist with entity extraction (People, places, companies)

Re: The bbc approach could be adopted by content organisations, but I would start very simply and build a business case for taxonomies e.g. as a proof of concept and then build from there. 

Hope that helps. Excuse the late replay, I've been in workshops all day.

Thanks

Zahoor


--
You received this message because you are subscribed to the Google Groups "Content Strategy" group.
To post to this group, send email to content...@googlegroups.com.
To unsubscribe from this group, send email to contentstrate...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/contentstrategy?hl=en.

Matt Moore

unread,
Nov 28, 2012, 8:04:14 PM11/28/12
to content...@googlegroups.com
Zahoor,

Dipping in and out of your comments.

> Scaling up manual metadata tagging is just not commercially
> possible for most business, and most people hate tagging!

Mostly agree with this (which is why the results from folksonomy-based
systmes have been mixed) with a few caveats:
- People tag things if doing so is beneficial to them. Many people
don't seem to mind tagging other people in photos for example.
- If your user-base is big enough, having a relatively small number of
taggers is not actually a problem.
- Some tools turn tagging into a game (I'm thinking of Luis von Ahn's
ESP game here).

> Most ontologies need not be complex, and in the world of agile
> taxonomy development you would start with simple vocabularies that everyone
> can agree on and increase breadth and/or depth through iterations e.g. we
> have a market ontology which has (Markets, Regions, Competitors, Products,
> Audiences etc.) and tagging content using just these vocabularies across
> business/regions would significantly increase findability of content.

Taxonomies and ontologies can be as simple or complex as you like.
Faceted taxonomies are not single, monolithic structures and so allow
agile approaches. Ontologies are more complex because of the variety
of predicates that you can develop (whereas in a thesaurus, you have a
small number of predicates such as: narrower term, broader term, use
for, etc).

The area that offers the most benefits long term is the
standardisation, reuse and interlinking of ontologies for specific
domains - and thence the data/content associated with them.

We're still aways away from that being commonplace tho. I imagine
you've seen this before, Zahoor, but it might be interesting for
others: http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png

This comment made me smile: "simple vocabularies that everyone can
agree on". Even the simplest vocabulary can generate lots friction
between different groups. The naming of things is inherently
political.

Cheers,

Matt

Malcolm Davison

unread,
Nov 29, 2012, 4:24:02 AM11/29/12
to content...@googlegroups.com
Thanks Zahoor for your insightful contribution.

Simplicity is the watch word here. As a former computer book author and feature writer for Knowledge Management Magazine with 45 years of programming experience - there is a phrase I use:

“There are fundamentally two states in computing ‘on’ and ‘off’ - any complexity of interpretation is entirely manmade.”

As discussed earlier, 'taxonomy' is a term that carries different meanings for different people - perhaps you have just made the case that we should avoid using it altogether! I deliberately bypassed the ‘ontology’ term as some readers might have been unfamiliar with it.

But thank you for clarifying the concepts for the benefit of others.

More generally. It's sad that so much information architecture is done badly these days. I despair at times. Complex structures are sometimes insisted on by government departments that impose over-bearing navigation and interfere with readability and usability.  Then from the consultant and agency perspective - there are jargon-loaded explanations that dissuade clients from engaging with the information experts. Both these exacerbate the situation.

It's refreshing to get an insight into imaginative projects such as the BBC's that can inspire others to emulate and develop the concepts further.

Malcolm

Rahel Anne Bailie

unread,
Nov 29, 2012, 7:23:45 AM11/29/12
to content...@googlegroups.com

Never use government as a baseline for best practices. They are not early adopters and at the current speed of change, they are always lagging behind industry.

--

Zahoor Hussain

unread,
Nov 29, 2012, 8:23:09 AM11/29/12
to content...@googlegroups.com
Thank you Malcolm!
I agree with simplification and not using terms like taxonomy and ontologies, most people outside the content domain do not know what you are talking about(!) 

@Matt
I am all for using LOD and open standards as a starting point, and then being more specific for each customers domain. 
Agree with the approach for creating domain specific ontologies, this is what we're doing. 
I am scratching my head in trying to solve the political problem of bringing together 1000's of terms into a single taxonomy tool.

@Rahel
re:government this is usually true. However there is the odd case e.g. the uk government is opening up the possibilities of open data to larger audience. See the work being performed by nigel shadbolt et al @ http://www.theodi.org/

Thanks

Zahoor

Malcolm Davison

unread,
Nov 29, 2012, 9:29:00 AM11/29/12
to content...@googlegroups.com
Yes I agree Rahel. I admire the government departments that are prepared to break ranks and ignore guidelines for the benefit of their organisations.

Zahoor - certainly there has been new thinking in UK government circles - www. gov.uk is another example. I am looking forward to attending the 'Digital by Default' Conference in London next week where I will no doubt learn more on the progress on this front. My fears though are that once standards have been raised that in turn these will become set in stone. Technological progress will always be more fleet of foot than the civil service. And there is a culture and expectation of obedience to codes of practice within governmental circles.

Malcolm

Matt Moore

unread,
Nov 29, 2012, 2:48:31 PM11/29/12
to content...@googlegroups.com, content...@googlegroups.com
Rahel,

This is a good rule of thumb for most things. E.g. A recent AIIM survey on internal & external social media use showed that govt agencies were way behind.

However in case of open data (& the information organisation structures that enable it), I've seen govt agencies be more proactive than the private sector.

- We've been discussing the awesome work the BBC does. Public broadcaster / quasi-govt entity.
- The Powerhouse Museum here in Sydney was using Flickr to crowdsourced the tagging of its images & OpenCalais for entity extraction way early. Govt body.
- Many of the data sources on the Linked Data map (e.g. census info, ordinance survey, data.gov) are from govt agencies.

Cheers,

Matt

Rahel Anne Bailie

unread,
Nov 29, 2012, 3:45:40 PM11/29/12
to content...@googlegroups.com

Am sitting in a taxonomy session with speaker Heather Harden (author of The Accidental Taxonomist) and Ole Gulbrandsen (TO of Webnodes). Feel like I hit paydirt at Gilbane.

Geoff Froh

unread,
Nov 27, 2012, 8:02:30 PM11/27/12
to content...@googlegroups.com
Hello,

Long-time lurker here. Thanks to everyone for the fascinating
conversation. The BBC Sports reference was especially useful, Zahoor
-- much appreciated.

I just wanted to descend from 40,000 feet, back to the original
question and add one more software tool I'm currently evaluating
called, TemaTres (http://www.w3.org/2001/sw/wiki/TemaTres).

It is open-source with a PHP/MySQL backend, RESTful interfaces and can
export common data exchange formats like SKOS, TopicMaps and MADS.
Certainly not anywhere on the order of the BBC Sports DSP platform;
but so far seems suited for simple thesauri structures, and the user
experience on the authoring side is targeted squarely at
library/indexing professionals. There is also a WordPress plugin,
though I haven't tried it out.

Thanks again for the stimulating discussion!

Geoff Froh

Densho.org
Reply all
Reply to author
Forward
0 new messages