Java API for Web-1T Corpus

195 views
Skip to first unread message

julien nioche

unread,
Apr 2, 2008, 12:42:14 PM4/2/08
to DigitalPebble
Java API for Web-1T Corpus

The Web 1T 5-gram corpus contains n-grams from unigrams through to 5-
grams compiled from counts on a one trillion word corpus. It is
distributed by the Linguistic Data Consortium for researchers.

We have developed a Java API which allows to query the Web 1t corpus
(or any corpus at a similar format). Unlike Get1T, our API allows
allows on-the-fly queries of the full set of Web 1T n-grams - even on
a modest machine. The API also helps creating n-gram corpora from
other sources (Lucene indices, BNC corpus).

Contact us for more details and the terms of use.

Julien Nioche

unread,
Nov 15, 2012, 4:18:13 PM11/15/12
to digita...@googlegroups.com, pj.m...@gmail.com
Hi Mudda

We do not maintain this library any longer. I'm sure there are other alternatives available in Java

Best regards

Julien

On Thursday, 15 November 2012 20:25:12 UTC, pj.m...@gmail.com wrote:
Hi,

I would like to whether you still have those API which can be used to access the GoogleNgrams.

Thanks,
Mudda

Julien Nioche

unread,
Nov 22, 2012, 9:26:41 AM11/22/12
to digita...@googlegroups.com, pj.m...@gmail.com
Having said that the code is still available at https://github.com/DigitalPebble/ngrams-api and you are most welcome to use it.
Reply all
Reply to author
Forward
0 new messages