Update: enwiktionary-20210619

22 views
Skip to first unread message

Aard...@web.de

unread,
Jun 19, 2021, 2:06:56 PMJun 19
to aard...@googlegroups.com
Updated enwiktionary-20210619 data in English language for Aard2 in *.slob are uploading on

http://ftp.halifax.rwth-aachen.de/aarddict/

Just download the dictionary from the appropriate folder.

Thank you to RWTH Aachen University for mirroring the data.

For the larger dictionaries there might be a Magnet link for a torrent, or if you want to see the dictionary of the dictionaries, see
https://github.com/itkach/slob/wiki/Dictionaries


Have fun



Nikolai Yourin

unread,
Jun 20, 2021, 3:16:58 PMJun 20
to aarddict
Are you sure everything is okay with this one?
I mean, the previous version was almost twice as large: enwiktionary-20201207.slob = 1472151640 bytes.

MHBraun

unread,
Jun 22, 2021, 9:05:03 AMJun 22
to aarddict
I used a higher compression. Let me know if something is wrong. It works on my system here.
Glad you are using it.

Nikolai Yourin

unread,
Jun 23, 2021, 5:10:18 PMJun 23
to aarddict
It seems to be missing about half of the articles:

enwiktionary-20210619.slob:
    3323509 total articles
    4659539 total words

enwiktionary-20201207.slob:
    6424353 total articles
    8643945 total words

MHBraun

unread,
Jun 24, 2021, 6:40:24 PMJun 24
to aarddict
oops... will start over.

MHBraun

unread,
Jun 25, 2021, 5:43:59 PMJun 25
to aarddict
This is strange.
The articles in the couchdb are matching the enwiktionary-20210619 slob file. Therefore the slob file got all the scraped articles of the couch database. I am not sure why the enwiktionary-20201207 carries twice the amount of articles.
To make sure I got all articles I am rescraping the whole database. Will take a week or so.
Thank you for the hint.


Nikolai Yourin schrieb am Sonntag, 20. Juni 2021 um 21:16:58 UTC+2:
Reply all
Reply to author
Forward
0 new messages