Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Wkipaedia and n-grams

0 views
Skip to first unread message

Ian Parker

unread,
Sep 29, 2009, 4:46:47 PM9/29/09
to
I have been taking a look at the Hutter files with a view to stripping
out the HTML and trying out LSA. I am well on my way to doing this. In
the mean time I have been taking a look at the headings. This is very
clearly an early version of Wikipaedia as most of the headings seem to
be empty.

http://sites.google.com/site/aitranslationproject/wikipaedia

One other observation. Wiki attempts to be multilingual. You will
notice Arabic translations of all the filled in titles. This is quite
important. It implies that Wiki represents a poly lingual dictionary
of what might be termed the most important scholarly phrases.

You know I have been bellyaching the fact that Google translates the
Stefan Boltzmann law as four times the temperature and that the
surface area of a sphere is 8πR. This gives an opportunity to get a
number of "truth" headings.

Does anyone know how I could get an up to date copy of the Wikipaedia
files.

Unfortunately Brainchild does not give descriptions of scholarly
bigrams and n-grams. I do not know how religion came into it.
Wikipaedia both then and now was/is a website of scientific consensus.


- Ian Parker


- Ian Parker

Mok-Kong Shen

unread,
Sep 30, 2009, 4:34:26 AM9/30/09
to
Ian Parker wrote:

> Does anyone know how I could get an up to date copy of the Wikipaedia
> files.

Presumably an unsatisfactory answer of mine due to misunderstanding:
But if you access a webpage, you can save its content as file with
a mouse click.

M. K. Shen

Ian Parker

unread,
Sep 30, 2009, 11:23:32 AM9/30/09
to
You cannot do that for the whole of Wiki. Anyway I am interested in
the n-gram translations which do not appear in the text.

To reiterate my idea is that if you have a number of principal n-grams
these can be accessed whenever they occur in text. What we need is
this.

n-gram | Translations | LSA Vector | OWL

In a text in any language we can spot a principle n-gram by this
method. The translation is the Wiki translation. We can also encode
some OWL. This to me would represent that start of "understanding".

That this has not been done is obvious when we look at some
translations.


- Ian Parker

BTW - The text I have is the Hutter text which I got from Matt Mahoney.

Lluc Potrony

unread,
Oct 16, 2009, 5:59:05 AM10/16/09
to
Ian Parker wrote:
> Does anyone know how I could get an up to date copy of the Wikipaedia
> files.

Do you know the page <http://en.wikipedia.org/wiki/
Wikipedia_database>? It gives several aproaches to getting the
Wikipedia info.

0 new messages