Russian WIKI

179 views
Skip to first unread message

Алексей Репин

unread,
Sep 5, 2014, 3:48:14 AM9/5/14
to aard...@googlegroups.com
Hello! I need dump russian wiki with most important images. Does it real chance to download it here or anywhere else?

Алексей Репин

unread,
Sep 5, 2014, 3:52:49 AM9/5/14
to aard...@googlegroups.com
P.S.: I couldn't download dump with magnet link (magnet:?xt=urn:btih:0c32cb16695dff411faa702434a26a513e60ca37&dn=ruwiki-20140215&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80) on http://aarddict.org/, m.b. because my internet provider block some traffic. But as I understood this wiki-dump also without any graphic content?

itkach

unread,
Sep 5, 2014, 7:47:19 AM9/5/14
to aard...@googlegroups.com
On Friday, September 5, 2014 3:52:49 AM UTC-4, Алексей Репин wrote:
P.S.: I couldn't download dump with magnet link (magnet:?xt=urn:btih:0c32cb16695dff411faa702434a26a513e60ca37&dn=ruwiki-20140215&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80) on http://aarddict.org/, m.b. because my internet provider block some traffic. But as I understood this wiki-dump also without any graphic content?

I can give you http link if it helps, but indeed there are no images in .aar dictionaries except for embedded ones for math. Image support is in the works for Aard 2 but it's not ready yet.  

sklart

unread,
Oct 20, 2015, 3:13:01 AM10/20/15
to aarddict
Игорь, здравствуйте!

Меня на русскоязычном форуме попросили сделать словарь ruwiki в формате для Aard 1.x. Нужно это для конвертации aar -> stardict и последующего использования в таких программах как Goldendict (поддержки slob в версии для Android там пока нет в отличие от desktop-версии).
Вот здесь по ссылке их выложил. Делал из той же базы CouchDB, что slob-формат, выложенный на Github.

Команда следующая
aardc mwcouch http://127.0.0.1:5984/ru-m-wikipedia-org -f common wiki image

Обнаружилось, что в словаре не хватает некоторых статей, например "Москва" и "СССР".

Если есть возможность (и желание возвращаться к устаревшему формату), можете Вы поправить aardc или пояснить, почему часть статей пропадает? Может быть дело в применяемых фильтрах (common, wiki, image)?

Прошу прощения, что на русском пишу (я знаю пожелание насчет написания запросов на английском для большего охвата аудитории), но сейчас поленился это все переводить, да и проблема довольно узконаправленная...

Спасибо.

sklart

unread,
Oct 20, 2015, 6:13:35 AM10/20/15
to aarddict
Также сообщают, что аналогичная проблема наблюдается в дампе английской википедии English Wikipedia by MHBraun. В нём нет статей стран, например, Ukraine, Russia, USA, Germany.

In English: There are no articles in English Wikipedia by MHBraun for Aard 1.x such as "Ukraine", "Russia", "USA", "Germany".

itkach

unread,
Oct 20, 2015, 10:30:33 AM10/20/15
to aarddict


On Tuesday, October 20, 2015 at 6:13:35 AM UTC-4, sklart wrote:
Также сообщают, что аналогичная проблема наблюдается в дампе английской википедии English Wikipedia by MHBraun. В нём нет статей стран, например, Ukraine, Russia, USA, Germany.

In English: There are no articles in English Wikipedia by MHBraun for Aard 1.x such as "Ukraine", "Russia", "USA", "Germany".

Content filters may cause this if they remove all content and result in empty document. Are you running with default content filters that come with converter or did you modify them/added your own? 

I tried to recreate this real quick by getting these specific articles with mwscrape and then compiling small .aar dictionary with just these articles (usin aardc with --key), but it doesn't happen for me, .aar contains all articles, cleaned up according to content filters, as expected. 

I don't really want to spend time on the old tools and old format. Whoever came up with aar -> stardict -> goldendict may as well get up to date and create slob -> startdict :) Although this brings another question - is Aard 2 on Android really so terrible ?

itkach

unread,
Oct 20, 2015, 10:33:40 AM10/20/15
to aarddict
On Tuesday, October 20, 2015 at 10:30:33 AM UTC-4, itkach wrote:


On Tuesday, October 20, 2015 at 6:13:35 AM UTC-4, sklart wrote:
Также сообщают, что аналогичная проблема наблюдается в дампе английской википедии English Wikipedia by MHBraun. В нём нет статей стран, например, Ukraine, Russia, USA, Germany.

In English: There are no articles in English Wikipedia by MHBraun for Aard 1.x such as "Ukraine", "Russia", "USA", "Germany".

Content filters may cause this if they remove all content and result in empty document. Are you running with default content filters that come with converter or did you modify them/added your own? 

Also, are there any errors in the log file? At the beginning of compilation aardc prints a message like

Writing log to ./aardc-en-m-wikipedia-org-1445350498/log

anything interesting there?

ccaid

unread,
Oct 20, 2015, 3:09:30 PM10/20/15
to aarddict
вторник, 20 октября 2015 г., 17:30:33 UTC+3 пользователь itkach написал:
Whoever came up with aar -> stardict -> goldendict may as well get up to date and create slob -> startdict :)
first (minor). path starting with aard had developed and successfully used. so there was hopes that the developed toolchain will be used to further,may be with little tunes.
second (major). slob uses textual form of math (and so on) expressions with online drawing. goldendict will not perform this. to develop converter with expressions offline drawing is very hard job.
 
Although this brings another question - is Aard 2 on Android really so terrible ?
no. it is fairly good. but fine dict app without necessary dicts need not. 

itkach

unread,
Oct 21, 2015, 10:25:49 AM10/21/15
to aarddict
On Tuesday, October 20, 2015 at 3:09:30 PM UTC-4, ccaid wrote:
вторник, 20 октября 2015 г., 17:30:33 UTC+3 пользователь itkach написал:
Whoever came up with aar -> stardict -> goldendict may as well get up to date and create slob -> startdict :)

> goldendict will not perform this.
 
I'm not sure why. It renders articles with Webkit, same as aarddict, all the javascript code and fonts that actually do the rendering are also in slob.

to develop converter with expressions offline drawing is very hard job.

Perhaps it's hard, but it's already done and is available to anyone who cares to take it. The hard part of actually rendering math is implemented by TeX ecosystem. For those who want to take advantage of this the main challenge is to install and configure a fully functioning TeX environment. Luckily this has been solved by packaging in Linux distributions. For example, on Ubuntu the whole installation boils down to apt-getting a few packages. Another element of the process is to have TeX output rendered to an image - here's the code, and here's how it's used, no reason why this can't be reused. 

ccaid

unread,
Oct 21, 2015, 3:39:47 PM10/21/15
to aarddict
среда, 21 октября 2015 г., 17:25:49 UTC+3 пользователь itkach написал:
On Tuesday, October 20, 2015 at 3:09:30 PM UTC-4, ccaid wrote:
> goldendict will not perform this.
 
I'm not sure why. It renders articles with Webkit, same as aarddict, all the javascript code and fonts that actually do the rendering are also in slob.
it's my suppose. I do not know how to check this before creating of new converter, but my impression is that math rendering does not appear automagically. if I remember correct, in aardtools/aard2 some work and iterations were done before rendering started to work properly. in desktop goldendict the same thing happened.

sklart

unread,
Oct 22, 2015, 4:20:54 AM10/22/15
to aarddict

вторник, 20 октября 2015 г., 17:30:33 UTC+3 пользователь itkach написал:
Content filters may cause this if they remove all content and result in empty document. Are you running with default content filters that come with converter or did you modify them/added your own? 

I tried to recreate this real quick by getting these specific articles with mwscrape and then compiling small .aar dictionary with just these articles (usin aardc with --key), but it doesn't happen for me, .aar contains all articles, cleaned up according to content filters, as expected. 

I don't really want to spend time on the old tools and old format. Whoever came up with aar -> stardict -> goldendict may as well get up to date and create slob -> startdict :) Although this brings another question - is Aard 2 on Android really so terrible ?

 Yes, I run aardc with default content filters.

вторник, 20 октября 2015 г., 17:33:40 UTC+3 пользователь itkach написал:
Also, are there any errors in the log file? At the beginning of compilation aardc prints a message like

Writing log to ./aardc-en-m-wikipedia-org-1445350498/log

anything interesting there?

I can not be responsible for enwiki by MHBraun.
But what about ruwiki: I have deleted the contents of the folder created by aardc.
So I had to recreate ruwiki.aar. The above-mentioned articles in recreated dic also missing.
Log and other files from this folder added to cloud.



вторник, 20 октября 2015 г., 17:30:33 UTC+3 пользователь itkach написал:
Although this brings another question - is Aard 2 on Android really so terrible ?

I personally use Aard 2 and fully satisfied :). All that is said above, at the request of users by Goldendict who want to have all the dictionaries in a single application.

itkach

unread,
Oct 22, 2015, 12:03:17 PM10/22/15
to aarddict
On Thursday, October 22, 2015 at 4:20:54 AM UTC-4, sklart wrote:
Log and other files from this folder added to cloud.

I see there's a lot of articles in empty.txt, such as  Москва. Can you get latest versions of some of those articles with mwscrape and then compile a dictionary with just those articles, with and without filters? I did that and I can't recreate the problem.

mhbraun

unread,
Oct 24, 2015, 12:37:27 PM10/24/15
to aarddict
I am not sure what it is going on. Seems like the articles are there but can not be found or are listed differently by the reader.
If I go to Germany (disambuigation) and take the Germany link there I do not end up with the country but again with the list attached.

I used the standard filter to create the aar.
Aard Dictionary - 1 dictionary (9 volumes)_001.png
Aard Dictionary - 1 dictionary (9 volumes)_002.png

itkach

unread,
Nov 15, 2015, 11:15:28 AM11/15/15
to aarddict

I think I know what the problem is, should be fixed by https://github.com/aarddict/tools/commit/c3e7f662b4956c18a51391cfa28488a4c0507c55

Please update aardtools and recompile.
Message has been deleted

sklart

unread,
Dec 25, 2015, 1:11:10 PM12/25/15
to aarddict
Игорь, здравствуйте!
Наткнулся на небольшое неудобство при просмотре статей в русской википедии.
Есть персоналия "Абель, Рудольф Иванович". В истории, как и в Википедии, есть два человека с этими данными (один в свое время взял в качестве псевдонима данные другого).
На странице с перечислениями людей это показано:

Так вот, при выборе что одного "Рудольфа Абеля", что другого, программа перекидывает на одну и ту же личность


а для того, чтобы перейти на второго, приходится использовать свайпы влево-вправо


В чем причина такого поведения программы? Можно ли как-то поправить?

Использую данную версию словаря.

Спасибо.

Igor Tkach

unread,
Dec 26, 2015, 1:41:22 PM12/26/15
to aard...@googlegroups.com
2015-12-25 13:11 GMT-05:00 sklart <skla...@gmail.com>

В чем причина такого поведения программы? Можно ли как-то поправить?


Это связано с перенаправлениями и тем, как определены ссылки, а также с особым использованием символа _, который в ссылках Mediawiki эквивалентен пробелу. На disambiguation странице две ссылки:

- Абель,_Рудольф_Иванович
- Рудольф_Абель

Для статьи Рудольф_Абель сущестуют такие перенаправления:

Абель, Рудольф
Абель Рудольф Иванович
Вильям Генрихович Фишер
Вильям Фишер
Рудольф Абель
Рудольф Иванович Абель
Фишер, Вильям
Фишер Вильям Генрихович
Фишер, Вильям Генрихович
 
Как видите перенаправление "Абель Рудольф Иванович" совпадает с заголовком другой статьи "Абель, Рудольф Иванович" с точностью до запятой. Если бы ссылка в википедии содержала пробелы вместо _, то было бы точное совпадение и ссылка работала бы правильно. А так срабатывает совпадение с точностью до пунктуации.

Исправить это можно удалив перенаправление "Абель Рудольф Иванович" из поля aliases в CouchDB документе статьи "Рудоль Абель" (http://localhost:5984/_utils/document.html?ru-m-wikipedia-org/Рудольф%20Абель) либо добавив перенаправление "Абель,_Рудольф_Иванович" (точно соответствующее ссылке) в поле aliases для документа "Абель, Рудольф Иванович" (http://localhost:5984/_utils/document.html?ru-m-wikipedia-org/Абель,%20Рудольф%20Иванович) и перекомпилировав словарь. 

Возможно стоит изменить mwscrape2slob  так чтобы _ заменялся на пробел во всех внутренних ссылках, поскольку _ не присутствует в ключах в словаре - MediaWiki API выдаёт загаловки статей в человеческом виде (с пробелами) и сообдественно таким мы их и хотим видеть.


 

Igor Tkach

unread,
Dec 26, 2015, 2:56:51 PM12/26/15
to aard...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "aarddict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Markus Braun

unread,
Dec 26, 2015, 5:15:21 PM12/26/15
to aard...@googlegroups.com

Also Jungs, das ist ja gut und nett, was Ihr da macht.

Allerdings verstehe ich kein Wort.

Ist die Forensprache nicht englisch?

--
You received this message because you are subscribed to a topic in the Google Groups "aarddict" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aarddict/Tsmo6S7zeF4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aarddict+u...@googlegroups.com.

itkach

unread,
Dec 26, 2015, 5:43:42 PM12/26/15
to aarddict

On Saturday, December 26, 2015 at 5:15:21 PM UTC-5, mhbraun wrote:

Also Jungs, das ist ja gut und nett, was Ihr da macht.

Allerdings verstehe ich kein Wort.

Ist die Forensprache nicht englisch?


I certainly encourage everyone to post in English as it allows more people to participate and benefit, but I can't enforce it. 
 
The issue discussed here is about a specific link in Russian Wikipedia that opens wrong article. The crux of the matter is that title of one article differs from redirect for another article in only a comma. This would have been sufficient to differentiate if link in the disambiguation article matched a key in the dictionary exactly. However, the link uses underscore (_) in place of a space (underscore is commonly used in Wikipedia URLs instead of space and is treated as space equivalent, probably in a misguided attempt to avoid having to URL-encode space), so inexact match kicks in when following the link and wrong article is loaded. I made a change in mwscrape2slob to replace _ with space in internal links so that it better matches dictionary keys. I think this should resolve the issue (but please do test dictionaries you compile before making them available) 

 

 

Von: aard...@googlegroups.com [mailto:aarddict@googlegroups.com] Im Auftrag von Igor Tkach
Gesendet: Samstag, 26. Dezember 2015 20:57
An: aard...@googlegroups.com
Betreff: Re: Russian WIKI

 

2015-12-25 13:11 GMT-05:00 sklart:

Игорь, здравствуйте!

Наткнулся на небольшое неудобство при просмотре статей в русской википедии.

Есть персоналия "Абель, Рудольф Иванович". В истории, как и в Википедии, есть два человека с этими данными (один в свое время взял в качестве псевдонима данные другого).

На странице с перечислениями людей это показано:

Так вот, при выборе что одного "Рудольфа Абеля", что другого, программа перекидывает на одну и ту же личность

 

а для того, чтобы перейти на второго, приходится использовать свайпы влево-вправо

 

В чем причина такого поведения программы? Можно ли как-то поправить?

Использую данную версию словаря.

Спасибо.

--
You received this message because you are subscribed to the Google Groups "aarddict" group.

To unsubscribe from this group and stop receiving emails from it, send an email to aarddict+unsubscribe@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "aarddict" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aarddict/Tsmo6S7zeF4/unsubscribe.

To unsubscribe from this group and all its topics, send an email to aarddict+unsubscribe@googlegroups.com.

Markus Braun

unread,
Dec 26, 2015, 6:14:23 PM12/26/15
to aard...@googlegroups.com

Thanks for the explanation Igor.

 

 

Von: aard...@googlegroups.com [mailto:aard...@googlegroups.com] Im Auftrag von itkach
Gesendet: Samstag, 26. Dezember 2015 23:44
An: aarddict
Betreff: Re: Russian WIKI

 


On Saturday, December 26, 2015 at 5:15:21 PM UTC-5, mhbraun wrote:

Also Jungs, das ist ja gut und nett, was Ihr da macht.

Allerdings verstehe ich kein Wort.

Ist die Forensprache nicht englisch?

 

I certainly encourage everyone to post in English as it allows more people to participate and benefit, but I can't enforce it. 

 

The issue discussed here is about a specific link in Russian Wikipedia that opens wrong article. The crux of the matter is that title of one article differs from redirect for another article in only a comma. This would have been sufficient to differentiate if link in the disambiguation article matched a key in the dictionary exactly. However, the link uses underscore (_) in place of a space (underscore is commonly used in Wikipedia URLs instead of space and is treated as space equivalent, probably in a misguided attempt to avoid having to URL-encode space), so inexact match kicks in when following the link and wrong article is loaded. I made a change in mwscrape2slob to replace _ with space in internal links so that it better matches dictionary keys. I think this should resolve the issue (but please do test dictionaries you compile before making them available) 

 

 

 

Von: aard...@googlegroups.com [mailto:aard...@googlegroups.com] Im Auftrag von Igor Tkach
Gesendet: Samstag, 26. Dezember 2015 20:57
An: aard...@googlegroups.com
Betreff: Re: Russian WIKI

 

2015-12-25 13:11 GMT-05:00 sklart:

Игорь, здравствуйте!

Наткнулся на небольшое неудобство при просмотре статей в русской википедии.

Есть персоналия "Абель, Рудольф Иванович". В истории, как и в Википедии, есть два человека с этими данными (один в свое время взял в качестве псевдонима данные другого).

На странице с перечислениями людей это показано:

sklart

unread,
Jan 14, 2016, 12:06:49 PM1/14/16
to aarddict
суббота, 26 декабря 2015 г., 22:56:51 UTC+3 пользователь itkach написал:

Thanks, it work fine!
Sorry for the late reply, just now able to test.
Reply all
Reply to author
Forward
0 new messages