Queries for Ad-hoc Retrieval Experiments on NYT Corpus

367 views
Skip to first unread message

kberberi

unread,
Apr 14, 2009, 8:38:33 AM4/14/09
to The New York Times Annotated Corpus Community
Hi,

I was wondering whether anyone has conducted ad-hoc retrieval
experiments using the NYT corpus.

If so, which queries did you use? Of course it would be great if a
"real" workload (no click log needed!) was available (e.g., as
observed on the site search of New York Times).

Thanks a lot and kind regards,

Klaus

Brendan O'Connor

unread,
Apr 14, 2009, 4:30:46 PM4/14/09
to nyt...@googlegroups.com
I'd be interested in hearing about this too, if anyone has looked into it.

The lack of availability of real-world query logs is a huge impediment to information retrieval research.  At least in web search, it seems like private companies (Google, Yahoo, Microsoft) are making academics obsolete simply because their internal researchers have access to such better data than in the public sphere.

As for NYT IR, there are the AOL web search query logs out there... maybe they could be mined somehow for NYT-relevant queries.

If the NYT released search.nytimes.com query and/or click data, that with the corpus could become a very interesting & applicable research area.

Brendan

Daniel Tunkelang

unread,
Apr 14, 2009, 4:37:30 PM4/14/09
to nyt...@googlegroups.com
For that matter, there are also the daily Google Trends queries, available as far back as May 15, 2007.

Daniel
--
Daniel Tunkelang
Chief Scientist, Endeca
Blog: http://thenoisychannel.com/

Evan Sandhaus <sandhes@nytimes.com>

unread,
Apr 14, 2009, 5:06:26 PM4/14/09
to The New York Times Annotated Corpus Community
All,

I realize that this doesn't provide a whole lot of queries but its
better than nothing.

http://www.nytimes.com/gst/mostsearched.html

You get the top 25 daily, weekly, and monthly queries on nytimes.com
along with related queries for each query. The data is updated
hourly. I'll poke around at The Times and see if an expanded version
of this list is available.

~Evan
--
Evan Sandhaus

Semantic Technologist
Research & Development Operations
New York Times Company
(212)556-3826



On Apr 14, 4:37 pm, Daniel Tunkelang <dtunkel...@gmail.com> wrote:
> For that matter, there are also the daily Google
> Trends<http://www.google.com/trends/hottrends>queries, available as
> far back as May
> 15, 2007 <http://www.google.com/trends/hottrends?sa=X&date=2007-5-15>.
>
> Daniel
>
> On 4/14/09, Brendan O'Connor <breno...@gmail.com> wrote:
>
>
>
>
>
> > I'd be interested in hearing about this too, if anyone has looked into it.
>
> > The lack of availability of real-world query logs is a huge impediment to
> > information retrieval research.  At least in web search, it seems like
> > private companies (Google, Yahoo, Microsoft) are making academics obsolete
> > simply because their internal researchers have access to such better data
> > than in the public sphere.
>
> > As for NYT IR, there are the AOL web search query logs out there... maybe
> > they could be mined somehow for NYT-relevant queries.
>
> > If the NYT released search.nytimes.com query and/or click data, that with
> > the corpus could become a very interesting & applicable research area.
>
> > Brendan
>
Reply all
Reply to author
Forward
0 new messages