Wikitionary update for Hindi and Malayalam Please

41 views
Skip to first unread message

Rahul

unread,
Jan 12, 2021, 10:46:22 AM1/12/21
to aarddict
The slob file is very outdated an update would have been great

Rahul

unread,
Jan 15, 2021, 9:04:30 AM1/15/21
to aarddict

Malayalam link https://ml.m.wiktionary.org/wiki/പ്രധാന_താൾ Hindi link https://hi.m.wiktionary.org/wiki/मुखपृष्ठ
Thanks

franc

unread,
Jan 15, 2021, 12:16:42 PM1/15/21
to aarddict
Rahul schrieb am Freitag, 15. Januar 2021 um 15:04:30 UTC+1:

Malayalam link https://ml.m.wiktionary.org/wiki/പ്രധാന_താൾ Hindi link https://hi.m.wiktionary.org/wiki/मुखपृष्ठ


Oh, oh. my server doesn't find it, I guess it is because of the strange writing. See this errors:

(env-mwscrape-py3) [18.09.30][root@example:/home/franc/aard/env-mwscrape-py3#mwscrape ml.m.wiktionary.org/wiki/%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A7%E0%B4%BE%E0%B4%A8_%E0%B4%A4%E0%B4%BE%E0%B5%BE --speed 5 --db malayalam --couch http://admin:pass...@127.0.0.1:5984
Connecting http://127.0.0.1:5984 as user admin
Starting session malayalam-1610730689-884
Traceback (most recent call last):
  File "/home/franc/aard/env-mwscrape-py3/bin/mwscrape", line 11, in <module>
    load_entry_point('mwscrape==1.0', 'console_scripts', 'mwscrape')()
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwscrape/scrape.py", line 367, in main
    site = mwclient.Site(host, path=args.site_path, ext=args.site_ext, scheme=scheme)
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 131, in __init__
    self.site_init()
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 153, in site_init
    retry_on_error=False)
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 235, in get
    return self.api(action, 'GET', *args, **kwargs)
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 286, in api
    info = self.raw_api(action, http_method, **kwargs)
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 434, in raw_api
    http_method=http_method)
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/mwclient/client.py", line 405, in raw_call
    stream.raise_for_status()
  File "/home/franc/aard/env-mwscrape-py3/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://ml.m.wiktionary.org/wiki/%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A7%E0%B4%BE%E0%B4%A8_%E0%B4%A4%E0%B4%BE%E0%B5%BE/w/api.php?meta=siteinfo%7Cuserinfo%7Cuserinfo&siprop=general%7Cnamespaces&uiprop=groups%7Crights%7Cblockinfo%7Chasmsg&continue=&action=query&format=json

Sorry, same error for Hindi.

Igor Tkach

unread,
Jan 15, 2021, 4:12:56 PM1/15/21
to aarddict
--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aarddict/9192bc8b-cc64-4af0-b099-0c162b7bd938n%40googlegroups.com.

Frank Roehm

unread,
Jan 15, 2021, 4:26:03 PM1/15/21
to aard...@googlegroups.com
Oh!
Will try again then...

franc

unread,
Jan 16, 2021, 4:16:47 AM1/16/21
to aarddict
franc schrieb am Freitag, 15. Januar 2021 um 22:26:03 UTC+1:
Oh!
Will try again then...

OK, both are running :)
Wait now...

Rahul

unread,
Jan 16, 2021, 4:35:30 AM1/16/21
to aarddict
Thanks soo much❤️

franc

unread,
Jan 16, 2021, 3:36:45 PM1/16/21
to aarddict
Ok then, check this:

https://7fw.de/download/wiki/ml/2021-01-16_malayalamwiki.slob
and:

and test, if it works. It is unfortunately very small, only 42 MB the Hindi, and very only 16 MB the Malayalam, so either (what I think) these Wikis are not yet very much populated with entries, or either (what I dont hope) it is so small because I made something wrong (but I had no error at all, which speaks against it).

So plese give some feedback if not mind :) Thank.
frank

Rahul

unread,
Jan 16, 2021, 10:33:35 PM1/16/21
to aarddict
Yes these wikis dont have much entries. I just did a quick search and found out in Malayalam the hiperlinks didn't show up for one word. I search of the word Covans in old dictionary the word covan is hiperlinked and in new dictionary its not. Not a big deal though. Also both new and old malayalam dictionary dont have an entry named Coven the hiperlink just goes to The Collaborative International Dictionary of English . Check the ss belowIMG_20210117_084759.jpgIMG_20210117_084709.jpg

Frank Roehm

unread,
Jan 17, 2021, 4:07:41 AM1/17/21
to aard...@googlegroups.com
I cannot test it.
I added them to my aard2 dictionaries but I think I would need to switch my phone's language to Hindi or Malayalam to test these dictionary's entries.
This is not an option ;)
I cannot read any word.
Even the letters I don't know.
Hindi nearly the same.
So if somebody could tell me how to fix this missing links or it must be enough :)
Sorry.
Frank

Rahul

unread,
Jan 17, 2021, 4:59:12 AM1/17/21
to aarddict
Its ok bro its just one word everything else is working great. Thanks soo much for the conversion. 🙏.

Rahul

unread,
Jan 17, 2021, 5:02:03 AM1/17/21
to aarddict
It would have been great if you
update also the github page it may help others.
Thanks.

franc

unread,
Jan 17, 2021, 5:50:21 AM1/17/21
to aarddict
Thanks for the hint, done it:


But now I noticed a mistake I made. I mwscrape2slobbed it with:

mwscrape2slob http://admin:password@localhost:5984/hindiwiki -f common wiki
and:
mwscrape2slob http://admin:password@localhost:5984/malayalamwiki -f common wiki


because I made (from the beginning) the mistkae at thinking it would be the Wiki and not the Wiktionary! I didnt pay attention exactly, read to fast over it, shame on me :)
So with the -f common wiki that slob could have different links, maybe.

I deleted the db already, I tried to rename it, didnt know how, too lazy to look how, than deleted it, as I thought I wouldnt need it anymore :(
So I will re-scrape (running already) and re-scrape2slob them again, see if it is better with -f common wikt as parameter, as so:

mwscrape2slob http://admin:password@localhost:5984/hindiwiktionary -f common wikt
and:
mwscrape2slob http://admin:password@localhost:5984/malayalamwiktionary -f common wikt

Wait please...

Igor Tkach

unread,
Jan 17, 2021, 11:17:54 AM1/17/21
to aarddict
there was no need to re-scrape, just re-run mwscrape2slob with correct filters for wiktionary (-f common wikt) - it does not affect the scrape database in any way

also, comparing with older dictionary version might be of some interest, but ultimately the goal is to match current online version, so it's better to compare with that (article title is a link to online version, just click it quickly get online version in a browser)

--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.

Rahul

unread,
Jan 17, 2021, 12:07:45 PM1/17/21
to aarddict
Its ok does it change anything in article. Or are you pulling another update.

franc

unread,
Jan 17, 2021, 1:07:34 PM1/17/21
to aarddict


OK then, I did it again, now I hope I made no mistakes, please check :)
Updated already the Dictionaries page.
Here:

and:

@Igor Yes, I know it woud not have been neccessary to scrape again, but I deleted the DBs already (see above), that was my bad :(
Anyway, it scrapes fast this.


MHBraun

unread,
Jan 17, 2021, 9:12:28 PM1/17/21
to aarddict
franc,
if you are intending to maintain the wiktionaries, I mean update it a couple of times a year, I could mirror it on RWTH.
Let me know.
Markus

Rahul

unread,
Jan 17, 2021, 10:04:29 PM1/17/21
to aarddict
Works perfect 👍

Frank Röhm

unread,
Jan 18, 2021, 5:02:15 AM1/18/21
to aard...@googlegroups.com

> Am 18.01.2021 um 03:12 schrieb 'MHBraun' via aarddict <aard...@googlegroups.com>:
>
> franc,
> if you are intending to maintain the wiktionaries, I mean update it a couple of times a year, I could mirror it on RWTH.
> Let me know.
> Markus

Thank Markus!
But no need, I will update it only on request and time :)
I keep my two "childs" frwiki and frwiktionary as regular scrapes (and unregular scrape2slobs).
franc
Reply all
Reply to author
Forward
0 new messages