Discuss Value Added News DRAFT technical specifications

2 views
Skip to first unread message

Mark Ng

unread,
Jun 30, 2009, 5:36:57 PM6/30/09
to valueaddednews
Discuss your thoughts on the initial Value Added News proposals here !

Daniel Bachhuber

unread,
Jul 10, 2009, 4:55:56 PM7/10/09
to valueaddednews
I'm very interested that this idea has taken the form of a project...
Why a microformat over custom RDF header and XHTML namespace though?

michael

unread,
Jul 10, 2009, 9:31:39 AM7/10/09
to valueaddednews
First, let me introduce myself, My name is Michael Donohoe and I've
been at The New York Times for five years or so. My first role was
working with the CMS, both internals and templating and now I'm
focused on frontend dev work, from Javascript, HTML, XSLT, PHP and all
the fun templating and framework options that provides.

My first impression was 'oh no, not another microformat' but very
pleased you're looking to extend hAtom instead of a competitor.

One point I would bring up for discussion and get the ball rolling,
since I"m not sure of its benefits myself, but:

Would it be beneficial to allow a limited amount of meta-data also
(example: 'politics', 'gop', 'pelosi, nancy' )?

I would see it as helpful as providing context to an article and to
say that while this article might talk about many things this is
really what t centers around.

Obviously this could be a can of worms, from SEO, or if you would want
to put a standardized taxonomy in front of that, and unknown forms of
abuse... But I'd like to throw it out there to get other peoples
opinions.

Would the negatives outweigh any positive gains?

Thanks,
Michael

Mark Ng

unread,
Jul 11, 2009, 5:11:22 AM7/11/09
to valueaddednews

On Jul 10, 9:55 pm, Daniel Bachhuber <danielbachhu...@gmail.com>
wrote:
> I'm very interested that this idea has taken the form of a project...
> Why a microformat over custom RDF header and XHTML namespace though?
>

A few reasons :

1) we can use existing parsers and anything already built that
understands hAtom will understand 75% of what we're proposing.
2) the process for integration for microformats (and our proposed
extensions to microformats) is a bit simpler and requires less
understanding on the part of the people implementing it.
3) what we're suggesting people do represents a very useful "first
step" in providing semantic value in news stories
4) in our initial conversations with several news organisations there
seems to be a greater understanding of microformats and poshformats in
the news industry as a whole. (There are some notable exceptions to
this, however).
5) Most of the people working on this, frankly, know a fair bit more
about microformats and poshformats than we do RDFa. That said, we did
spend a bit of time consulting with Dan Brickley, who gave us some
useful ideas and suggestions on how we might work with RDFa in the
future.

That said :

1) we're looking at using RDF and RDFa for some of the more
complicated statements we're going to encourage news organisations to
start making, where microformats just aren't that suitable. One
particular area we're considering suggesting this for is for marking
up principles statements to be machine readable.
2) our goal is getting news organisations to move towards making their
content as machine readable as possible in *any* manner, and we don't
necessarily see what we're suggesting as "the final solution". We
think that one of the major problems with both microformats and/or
RDFa is that they haven't been marketed properly towards news
organisations, and we'd like to help fix that. It should be the
exception that a news article has no machine readable basic
attributes, rather than the rule.

We'd certainly appreciate suggestions from, and conversations with
people involved in using RDFa in news content. We're not religious
about this in any way.

Mark

Mark Ng

unread,
Jul 11, 2009, 6:05:26 AM7/11/09
to valueaddednews
Hi Michael,

On Jul 10, 2:31 pm, michael <mich...@ifelse.org> wrote:
> First, let me introduce myself, My name is Michael Donohoe and I've
> been at The New York Times for five years or so. My first role was
> working with the CMS, both internals and templating and now I'm
> focused on frontend dev work, from Javascript, HTML, XSLT, PHP and all
> the fun templating and framework options that provides.

We're glad to have someone from the NYT joining the discussion!

>
> My first impression was 'oh no, not another microformat' but very
> pleased you're looking to extend hAtom instead of a competitor.

It's worth noting that we're not (currently) a microformat. A couple
of initial conversations with the microformats community indicated a
bit of a "chicken and egg" problem - it only becomes interesting from
the point of view of the community when something has been done and
people are starting to use it. We're looking at putting our
suggestions through the microformats process in the next couple of
weeks, but we don't need to do so in order for these proposals to be
useful. (The microformats community would currently call us a
"poshformat" - we're just calling it Value Added News).

> Would it be beneficial to allow a limited amount of meta-data also
> (example: 'politics', 'gop', 'pelosi, nancy' )?

Potentially. For someone using data from, for example, your news site
and only your news site, having some limited access to your taxonomy
would be incredibly useful. For someone working with articles from
several publishers, access to a mixed taxonomy would be less useful
(to borrow from your example, the NYT might mark up Nancy Pelosi as
'pelosi, nancy', and another organisation may mark her up as 'nancy
pelosi'.

Is the category/rel-tag part of the hAtom specification (http://
microformats.org/wiki/hatom#Entry_Category which we extend from, so is
available) sufficient for what you're talking about, or do you think
there is a general requirement for a more sophisticated way of
exposing internal taxonomies ?

> I would see it as helpful as providing context to an article and to
> say that while this article might talk about many things this is
> really what t centers around.

In my absolutely ideal world, it would be great if those subjects were
marked up in microformats or RDFa inside the article text, for
example, marking up people inside news articles with hcard. My
concern about this is that the kind of content enrichment required for
this is very much not in the workflow of most news organisations, and
automatically doing this using tools like Reuters Open Calais provides
no way of making sure that only important subjects are marked up
(perhaps using a relevance threshold, but something like that is only
reliable to a certain extent.

> Obviously this could be a can of worms, from SEO, or if you would want
> to put a standardized taxonomy in front of that, and unknown forms of
> abuse... But I'd like to throw it out there to get other peoples
> opinions.

It's still information that the search engines can spider if they
choose to, and act accordingly.

>
> Would the negatives outweigh any positive gains?

I don't think there are too many negatives from providing more useful
machine readable categorisation and subject information. The
potential gains are well worthwhile, though, I think. A good example
from a presentation at NewsInnovationLondon yesterday was talking
about a certain news outlet in the UK, on whose summary page for
Gordon Brown, the top entry was a caravan review, because it contained
a complaint that people were going to have to all have caravan
holidays because gordon brown is ruining the economy.

Mark

John

unread,
Jul 11, 2009, 7:12:18 AM7/11/09
to valueaddednews
I would like to see articles include the citation, even a link if the
publisher plans to keep the text on line indefinitely. Years ago I
used a legal research program called Casefinder (by Geronimo
Development). One feature I loved, and miss, is that when copying
text to MS Word Casefinder automatically appended the citation,
properly formatted for legal briefs. I would often copy a random word
just to get the citation. Including the proper citation in a tag
would at least make it available for programs and easier for honest,
but lazy, authors to cite their sources. Then, I guess, you would
have to persuade Microsoft, Apple, etc. to modify the cut & paste
function in their operating systems to look for and append a citation
tag.
Reply all
Reply to author
Forward
0 new messages