Update: enwiktionary-20210619

63 views
Skip to first unread message

Aard...@web.de

unread,
Jun 19, 2021, 2:06:56 PM6/19/21
to aard...@googlegroups.com
Updated enwiktionary-20210619 data in English language for Aard2 in *.slob are uploading on

http://ftp.halifax.rwth-aachen.de/aarddict/

Just download the dictionary from the appropriate folder.

Thank you to RWTH Aachen University for mirroring the data.

For the larger dictionaries there might be a Magnet link for a torrent, or if you want to see the dictionary of the dictionaries, see
https://github.com/itkach/slob/wiki/Dictionaries


Have fun



Nikolai Yourin

unread,
Jun 20, 2021, 3:16:58 PM6/20/21
to aarddict
Are you sure everything is okay with this one?
I mean, the previous version was almost twice as large: enwiktionary-20201207.slob = 1472151640 bytes.

MHBraun

unread,
Jun 22, 2021, 9:05:03 AM6/22/21
to aarddict
I used a higher compression. Let me know if something is wrong. It works on my system here.
Glad you are using it.

Nikolai Yourin

unread,
Jun 23, 2021, 5:10:18 PM6/23/21
to aarddict
It seems to be missing about half of the articles:

enwiktionary-20210619.slob:
    3323509 total articles
    4659539 total words

enwiktionary-20201207.slob:
    6424353 total articles
    8643945 total words

MHBraun

unread,
Jun 24, 2021, 6:40:24 PM6/24/21
to aarddict
oops... will start over.

MHBraun

unread,
Jun 25, 2021, 5:43:59 PM6/25/21
to aarddict
This is strange.
The articles in the couchdb are matching the enwiktionary-20210619 slob file. Therefore the slob file got all the scraped articles of the couch database. I am not sure why the enwiktionary-20201207 carries twice the amount of articles.
To make sure I got all articles I am rescraping the whole database. Will take a week or so.
Thank you for the hint.


Nikolai Yourin schrieb am Sonntag, 20. Juni 2021 um 21:16:58 UTC+2:

Markus Braun

unread,
Aug 30, 2021, 5:40:02 PM8/30/21
to aard...@googlegroups.com, Nikolai Yourin

Your are correct, Nikolai,

I rebuilt the database and got the same mistake. Apparently with enwiktionary only. The other wikipedias are fine. As I can not find the mistake and I do not want to setup my system from scratch now, I will not continue to update the enwiktionary.

Probably somebody else is willing to try it with more luck.


thank you

--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aarddict/6e26aa7e-01f4-4726-b057-3d5cfc0e12d9n%40googlegroups.com.

Nikolai Yourin

unread,
Aug 31, 2021, 10:18:38 AM8/31/21
to aarddict
Hi Markus,

I'm a bit confused now. Where did that most recent .slob file come from then?


That file seems to be absolutely fine, the article count is correct and I've been using it for about two weeks now without any problems.

Markus Braun

unread,
Aug 31, 2021, 5:37:14 PM8/31/21
to aard...@googlegroups.com, Nikolai Yourin

Itkach supported me and tried to find the issue on his system. It works fine on his system, but not on mine. So once and a while you may trigger him to update :)

Markus

Reply all
Reply to author
Forward
0 new messages