Please udpate fi.wiktionary.org

77 views
Skip to first unread message

Joonatan Sjöroos

unread,
Apr 16, 2023, 5:45:53 AMApr 16
to aarddict
Hi, please update fi.wiktionary.org, it seems it havent been updated since two years, Thanks!

AardF...@web.de

unread,
Apr 16, 2023, 8:39:02 AMApr 16
to aarddict
Hmm, I am wondering if you checked my earlier post here.
I updated fi.wiktionary.org last month and just a couple of days ago again.
So I am not quite sure what you are referring to.
Can you elaborate?

AardF...@web.de

unread,
Apr 16, 2023, 8:41:59 AMApr 16
to aarddict
Alright, I got it. fiwiktionary is not listed in my earlier post.
However if you would check https://ftp.halifax.rwth-aachen.de/aarddict/ you will find much more...


On Sunday, April 16, 2023 at 11:45:53 AM UTC+2 joonata...@gmail.com wrote:

Joonatan Sjöroos

unread,
Apr 16, 2023, 8:59:40 AMApr 16
to aarddict
Oh very nice thanks :)

franc

unread,
May 4, 2023, 12:14:11 PMMay 4
to aarddict
OK, F..K!!!
Somethings wrong, again the frwik* is crippled.
In the log of frwiki creation of slob (mwsrape2slob) I read some errors with "IncompleteRead":

  Brigasque (race ovine)
  Brigate Giustizia e Libertà
  Brigate rosse per la costruzione del Partito comunista combattenteERROR:mwscrape2slob:
Traceback (most recent call last):
  File "/home/franc/aard/env-mwscrape2slob/lib/python3.6/site-packages/mwscrape2slob/__init__.py", line 294, in run
    for title, aliases, text, error in resulti:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
http.client.IncompleteRead: IncompleteRead(1765919 bytes read)

  Brigate rosse-Partito guerriglia del proletariato metropolitano
  Brigaud

and a bit later it ends much too early:

  Briggs Islet
  Briggsidae
  Briggs Automotive Company
  Briggs Cunningham
  Briggs (cratère)

Finished adding content in 3:22:18
Finalizing...
Sorting... sorted in 0:00:20
Resolving aliases...
Sorting... sorted in 0:00:20
Resolved aliases in 0:00:20
Finalized in 0:00:57Traceback (most recent call last):
  File "/home/franc/aard/env-mwscrape2slob/bin/mwscrape2slob", line 8, in <module>
    sys.exit(main())
  File "/home/franc/aard/env-mwscrape2slob/lib/python3.6/site-packages/mwscrape2slob/__init__.py", line 838, in main
    article_source.run()
  File "/home/franc/aard/env-mwscrape2slob/lib/python3.6/site-packages/mwscrape2slob/__init__.py", line 294, in run
    for title, aliases, text, error in resulti:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
http.client.IncompleteRead: IncompleteRead(1765919 bytes read)

I have no log for frwiktionary at the moment, but I guess it is the same.
The dewiktionary log has no
IncompleteRead error, so I guess it is just not big enough.

I have to scrutinize that, and restart the mwsrape2slob again. In the meantime, only the last working is in the old directory.
Sorry.

franc

unread,
May 4, 2023, 3:58:06 PMMay 4
to aarddict
I started the frwiktionary mwscrape2slob manually and this worked, no error, this is the end of it (I didnt find these IncompleteRead errors anymore):

ADDING: '~/images/Globe.svg'
ADDING: '~/css/shared.css'
ADDING: '~/css/mediawiki_monobook.css'
ADDING: '~/css/mediawiki_shared.css'
ADDING: '~/css/night.css'
ADDING: '~/js/jquery-2.1.3.min.js'
ADDING: '~/js/styleswitcher.js'

Finished adding content in 2:57:27
Finalizing...
Sorting... sorted in 0:02:36
Resolving aliases...
Sorting... sorted in 0:02:33
Resolved aliases in 0:02:33
Finalized in 0:05:22
All done in 3:02:50

It is in the actual download folder as always:

https://7fw.de/download/wiki/fr/

OK, then I noticed, that time passed by and now there is mw2slob instead of mwscrape2slob, so I changed to that (first I updated mwscrape2slob and got error when trying to run mwsrape2slob, was not found).
At the moment I run manually the mw2slob for frwiki (which takes several hours, I guess the whole night) and will see if that works too.
If it works, I will try the script again, but with mw2slob. If that works too, I have to change all my scripts from mwscrape2slob to mw2slob (the command is different).

I have to say that I am on old Ubuntu 18.04 (will upgrade to 22.04 soon) so it could be that some of the packages are maybe too old.

I updated pip to pip-21.3.1and mw2slob to 1.1 without errors, by the way.SKIPPING (not included): 'filters/image'






Markus Braun

unread,
May 5, 2023, 1:25:43 AMMay 5
to aard...@googlegroups.com
I guess you will not be able to create a frwiki as there is no NS0 dump as of 20230501.
Lots of files are missing.


Markus



From: franc <franc...@gmail.com>
Sent: Thursday, May 4, 2023 21:58
To: aarddict
Subject: Re: Please udpate fi.wiktionary.org
--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.

Frank Roehm

unread,
May 5, 2023, 1:32:22 AMMay 5
to aard...@googlegroups.com
I do still the good old mwscrape, not from dumps.

franc

unread,
May 5, 2023, 3:34:29 AMMay 5
to aarddict
Nope.
mw2slob of frwiki didn't work neither with the actual mw2slob :(
Here the last output:

S Cuillère à caviar (3036)
S Cuillère à dessert (5163)
S Cuillère à glace (5721)

Finished adding content in 4:33:35
Finalizing...
Sorting... sorted in 0:00:35
Resolving aliases...
Sorting... sorted in 0:00:35
Resolved aliases in 0:00:35
Finalized in 0:01:42Traceback (most recent call last):
  File "/home/franc/aard/env-slob/bin/mw2slob", line 11, in <module>
    load_entry_point('mw2slob==1.1', 'console_scripts', 'mw2slob')()
  File "/home/franc/aard/env-slob/lib/python3.6/site-packages/mw2slob/cli.py", line 394, in main
    args.func(args)
  File "/home/franc/aard/env-slob/lib/python3.6/site-packages/mw2slob/cli.py", line 128, in cli_scrape
    run(outname, info, articles, args)
  File "/home/franc/aard/env-slob/lib/python3.6/site-packages/mw2slob/cli.py", line 78, in run
    filters=filters,
  File "/home/franc/aard/env-slob/lib/python3.6/site-packages/mw2slob/core.py", line 197, in create_slob
    run(slb, articles, filters, info.interwikimap, info.namespaces, html_encoding)
  File "/home/franc/aard/env-slob/lib/python3.6/site-packages/mw2slob/core.py", line 140, in run

    for title, aliases, text, error in resulti:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 347, in <genexpr>
    return (item for chunk in result for item in chunk)

  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
http.client.IncompleteRead: IncompleteRead(615657 bytes read)

Again that IncompleteRead :( :(
So if nobody could help me here what could be wrong here, I will stop the creation of frwiki and frwiktionary until my system is on 22.04, then I would give it a new try, I guess it is related, or indirect as a Python issue (old version or such).
Dont know, sorry.

I put the last wikis from march in the folder.
I had to put again shasum and size to the slob, might have deleted them accidently.

stat --format=%s frwiki_2023-03-02.slob > frwiki_2023-03-02.size.txt
shasum frwiki_2023-03-02.slob | sed -r 's/(.*) .*$/\1/' > frwiki_2023-03-02.sha.txt


frank

AardF...@web.de

unread,
May 6, 2023, 7:31:09 AMMay 6
to aarddict
Frank,
you created an April version as well. It is hosted on RWTH ;)

As for your system upgrade you could look into using the Debian 11 netinstall. 
Then during installation you disallow any GUI installation, but include the webserver and ssh. For GUI install 
sudo apt install lxde-core 
modify /etc/apt/sources.list with adding 'contrib non-free' to each line with main
install all your tools and voilà you got a sleek and fast updated system with no overhead.
I love the performance of it.

franc

unread,
May 31, 2023, 4:21:50 AMMay 31
to aarddict
AardF...@web.de schrieb am Samstag, 6. Mai 2023 um 13:31:09 UTC+2:
Frank,
you created an April version as well. It is hosted on RWTH ;)


Hallo, please delete this frwiktionary_2023-04-06.slob it is crippled!
The last working (I made) is from march.
Thank!

Reply all
Reply to author
Forward
0 new messages