I am creating a slob file enwiktionary-20240820.slob and will upload it to
ftp.halifax.rwth-aachen.de/aarddict/enwiki
I will let you know here when it will be available.
From there you may download the file and check if it is working fine for you.
Let me know if you are not interested.
Thank you
Markus
--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aarddict/324fd023-7d3a-4aab-98db-c86990c399aen%40googlegroups.com.
with slob.open(sys.argv[1]) as r:
headwords = list(r.as_dict())
To view this discussion on the web visit https://groups.google.com/d/msgid/aarddict/13c549b1-8999-40b2-8856-d5118e65d95dn%40googlegroups.com.
Thank you for the updated code.
I would not do anything with the duplicates. They are residing in the dumps.
If they are in the annexes, I guess they are there for a reason and the duplication is just a side effect.
… and next month we are starting over anyway.
I am glad they reduced the duplicates significantly
To view this discussion on the web visit https://groups.google.com/d/msgid/aarddict/cacc70d8-e5c7-4092-8a6e-421b5a3a0b21n%40googlegroups.com.