Happy 2009 From The New York Times

364 views
Skip to first unread message
Message has been deleted

Evan Sandhaus <sandhes@nytimes.com>

unread,
Jan 12, 2009, 8:10:29 AM1/12/09
to The New York Times Annotated Corpus Community
Hello from The New York Times and Happy 2009,

Thank you all for joining The New York Times Annotated Corpus
Community. My colleagues and I created this group in hopes of
fostering a vibrant research community around The New York Times
Annotated Corpus, and we are confident that 2009 will be a great year
for this community.

We recently added three pages to this community site: Events, Around
The Web, and Publications.

Events:
http://groups.google.com/group/nytnlp/web/new-york-times-annotated-corpus-events

Over the last few months, we have participated in some great events
involving The NYT Annotated Corpus, including a terrific meeting of
the New York Semantic Web Meetup (http://www.swnyc.org/index.php?
title=New_York_Semantic_Web_Meetup_at_the_New_York_Times). We created
the events page to chronicle these events and to make sure you know
about upcoming events in your area. Several events are currently in
the planning stage, so make sure to check back often.

Around The Web:
http://groups.google.com/group/nytnlp/web/the-new-york-times-annotated-corpus-around-the-web

It has been great to see the response to The New York Times Annotated
Corpus in the blogosphere. We created the "Around The Web" page to
aggregate these responses, so keep them coming.

Publications:
http://groups.google.com/group/nytnlp/web/publications

Since The NYT Annotated Corpus is only a few months old, we don't
expect to seen any research publications just yet. Even so, we are
very excited to learn about such publications as they arrive. To
this end, we've created the "Publications" page and look forward to
seeing it grow.

All three of these pages are editable by anybody in this community, so
if you have anything to share please do!

As always we are happy to help anybody in this community with
questions about The NYT Annotated Corpus.

Here's to a great 2009,

Evan Sandhaus
---
Semantic Technologist
Research & Development Operations
New York Times Company

Daniel Tunkelang

unread,
Jan 12, 2009, 10:21:23 AM1/12/09
to nyt...@googlegroups.com
Evan,

What are the guidelines around using the corpus in publicly facing demonstrations? Such demos could generate massive publicity around the corpus, not to mention spurring a healthy competition among information access vendors.

Daniel
--
Daniel Tunkelang
Chief Scientist, Endeca
Blog: http://thenoisychannel.com/


Neal Richter

unread,
Jan 12, 2009, 11:45:39 PM1/12/09
to The New York Times Annotated Corpus Community
Evan,

I see that the license is non-commercial. Some of the Licenses in
the LDC allow creating derivative data.

Example:
b. Summaries, analyses and interpretations of the linguistic
properties of the text may be derived and published, provided it is
not possible to reconstruct the information from these summaries.

This posted license for the nyt corpus does not have this clause
explicitly. Is creating statistical summaries etc allowed as above
for commercial use? Can it be used to train possible commercial use
machine learning algorithms? In either case the original information
is not reconstructible.

Any plans to release the "controlled vocabulary" you use?

Thanks - Neal Richter

On Jan 12, 6:10 am, "Evan Sandhaus <sand...@nytimes.com>"
<kan...@gmail.com> wrote:
> Hello from The New York Times and Happy 2009,
>
> Thank you all for joining The New York Times Annotated Corpus
> Community.  My colleagues and I created this group in hopes of
> fostering a vibrant research community around The New York Times
> Annotated Corpus, and we are confident that 2009 will be a great year
> for this community.
>
> We recently added three pages to this community site: Events, Around
> The Web, and Publications.
>
> Events:http://groups.google.com/group/nytnlp/web/new-york-times-annotated-co...
>
> Over the last few months, we have participated in some great events
> involving The NYT Annotated Corpus, including a terrific meeting of
> the New York Semantic Web Meetup (http://www.swnyc.org/index.php?
> title=New_York_Semantic_Web_Meetup_at_the_New_York_Times).  We created
> the events page to chronicle these events and to make sure you know
> about upcoming events in your area.  Several events are currently in
> the planning stage, so make sure to check back often.
>
> Around The Web:http://groups.google.com/group/nytnlp/web/the-new-york-times-annotate...

Evan Sandhaus <sandhes@nytimes.com>

unread,
Jan 13, 2009, 12:48:01 PM1/13/09
to The New York Times Annotated Corpus Community
Dan,

The Times and I understand the value of public facing prototypes and
will look into this issue. As soon as we have a definitive answer we
will report it back to this forum.

All the best,

Evan

On Jan 12, 10:21 am, "Daniel Tunkelang" <dtunkel...@gmail.com> wrote:
> Evan,
>
> What are the guidelines around using the corpus in publicly facing
> demonstrations? Such demos could generate massive publicity around the
> corpus, not to mention spurring a healthy competition among information
> access vendors.
>
> Daniel
>
> On Mon, Jan 12, 2009 at 8:10 AM, Evan Sandhaus <sand...@nytimes.com> <
>
>
>
> kan...@gmail.com> wrote:
>
> > Hello from The New York Times and Happy 2009,
>
> > Thank you all for joining The New York Times Annotated Corpus
> > Community.  My colleagues and I created this group in hopes of
> > fostering a vibrant research community around The New York Times
> > Annotated Corpus, and we are confident that 2009 will be a great year
> > for this community.
>
> > We recently added three pages to this community site: Events, Around
> > The Web, and Publications.
>
> > Events:
>
> >http://groups.google.com/group/nytnlp/web/new-york-times-annotated-co...
>
> > Over the last few months, we have participated in some great events
> > involving The NYT Annotated Corpus, including a terrific meeting of
> > the New York Semantic Web Meetup (http://www.swnyc.org/index.php?
> > title=New_York_Semantic_Web_Meetup_at_the_New_York_Times<http://www.swnyc.org/index.php?title=New_York_Semantic_Web_Meetup_at_...>).
> >  We created
> > the events page to chronicle these events and to make sure you know
> > about upcoming events in your area.  Several events are currently in
> > the planning stage, so make sure to check back often.
>
> > Around The Web:
>
> >http://groups.google.com/group/nytnlp/web/the-new-york-times-annotate...

Evan Sandhaus <sandhes@nytimes.com>

unread,
Jan 13, 2009, 12:48:13 PM1/13/09
to The New York Times Annotated Corpus Community
Dan,

The Times and I understand the value of public facing prototypes and
will look into this issue. As soon as we have a definitive answer we
will report it back to this forum.

All the best,

Evan

On Jan 12, 10:21 am, "Daniel Tunkelang" <dtunkel...@gmail.com> wrote:
> Evan,
>
> What are the guidelines around using the corpus in publicly facing
> demonstrations? Such demos could generate massive publicity around the
> corpus, not to mention spurring a healthy competition among information
> access vendors.
>
> Daniel
>
> On Mon, Jan 12, 2009 at 8:10 AM, Evan Sandhaus <sand...@nytimes.com> <
>
>
>
> kan...@gmail.com> wrote:
>
> > Hello from The New York Times and Happy 2009,
>
> > Thank you all for joining The New York Times Annotated Corpus
> > Community.  My colleagues and I created this group in hopes of
> > fostering a vibrant research community around The New York Times
> > Annotated Corpus, and we are confident that 2009 will be a great year
> > for this community.
>
> > We recently added three pages to this community site: Events, Around
> > The Web, and Publications.
>
> > Events:
>
> >http://groups.google.com/group/nytnlp/web/new-york-times-annotated-co...
>
> > Over the last few months, we have participated in some great events
> > involving The NYT Annotated Corpus, including a terrific meeting of
> > the New York Semantic Web Meetup (http://www.swnyc.org/index.php?
> > title=New_York_Semantic_Web_Meetup_at_the_New_York_Times<http://www.swnyc.org/index.php?title=New_York_Semantic_Web_Meetup_at_...>).
> >  We created
> > the events page to chronicle these events and to make sure you know
> > about upcoming events in your area.  Several events are currently in
> > the planning stage, so make sure to check back often.
>
> > Around The Web:
>
> >http://groups.google.com/group/nytnlp/web/the-new-york-times-annotate...

Evan Sandhaus <sandhes@nytimes.com>

unread,
Jan 13, 2009, 12:57:00 PM1/13/09
to The New York Times Annotated Corpus Community
Neal,

To your questions:

>Is creating statistical summaries etc allowed as above
>for commercial use? Can it be used to train possible commercial use
>machine learning algorithms?

The kind of work you describe is not permitted under the LDC
license. That being said, we are very open to discussing a
commercial licensing arrangement to our data. Please follow up with
me at san...@nytimes.com to discuss specifics.

> Any plans to release the "controlled vocabulary" you use?

The controlled indexing vocabulary is "included" with the corpus
insofar as it can be discovered by aggregating the indexing metatdata
across all of the corpus documents. I realize, however, that this is
a tedious process. As such, we are investigating the possibility of
making these kinds of resources available to the research community.
I will post back to this forum when we've decided how to proceed on
this.

All the best,

Evan
Reply all
Reply to author
Forward
0 new messages