I was taking a look at the markup applied on Guardian articles,
specifically the rel="tag" link markup, and the opengraph news tags.
Taking an example, say http://www.guardian.co.uk/world/2011/dec/09/uk-leading-role-europe-hague,
each tag is marked up in two places:
<meta property="article:tag" content="Eurozone crisis" />
and
<a href="http://www.guardian.co.uk/business/debt-crisis"
rel="tag">Eurozone crisis</a>
This seems fairly reasonable, except that the rel="tag" specification
at http://microformats.org/wiki/rel-tag clearly says "the last segment
of the path portion of the URI (after the final "/" character)
contains the tag value". In other words, according to the
specification, the meta-property markup is for the tag "Eurozone
crisis", but the rel=tag markup is for the tag "debt-crisis".
I wonder if this is an intentional decision by the guardian, or an
accident, or just technically unavoidable. I also wonder what tags
the guardian staff would recommend I extract from the page.
I also wonder if the last segment of these paths is unique for a given
tag; eg, it looks like they might be scoped by the section of the
newspaper ("business" in the above example), so is it possible that
two different tags might be given the same final segment?
--
Richard