MediaWiki crawl works but return all wiki title as "docname"

26 views
Skip to first unread message

Serge F

unread,
Jun 19, 2020, 9:28:12 AM6/19/20
to Datafari
Hello,

I'm running Datafari 4.4.1 Community edition on a ubuntu 18.04 server
I've added a job in order to crawl my MediaWiki 1.23

It works but all title retuned in the datafari search have the same title : "docname"

Anyone, can you help me solving this issue?
Regards.

Cedric Ulmer

unread,
Jun 19, 2020, 9:32:37 AM6/19/20
to Datafari
Hi Serge,

Thanks for using Datafari !
Which connector are you using ? The web one ?
Do you have security enabled, or is the mediawiki public ?

You say that the title is always the same, but are the other info correct ? (snippet, date etc) 

Regards,

Cedric

Serge F

unread,
Jun 22, 2020, 4:16:53 AM6/22/20
to Datafari
Hi Cedric,

This wiki is readonly for public (security enabled only for writing).
We are using the datafari "Wiki" connector in order to crawl it.
All other info are correct, screenshot attached.

Thanks for your help!

Appendix: We are using 1.27 of Mediawiki.
datafari_mediawiki.png

Cedric Ulmer

unread,
Jun 23, 2020, 9:39:11 AM6/23/20
to Datafari
Hello Serge,

We think it's related to one of the titles fields not being filled on the wiki side, and therefore Datafari fills it in with a default value.
You can circumvent this issue by forcing the Subclassresult widget to display the title of your choice.
You can modify this file to change the title displayed on the Datafari Search UI /opt/datafari/tomcat/webapps/Datafari/js/AjaxFranceLabs/widgets/SubClassResult.widget.js:
line 132 to 138 :
var title = "";
if (Array.isArray(doc.title)) {
try {
title = decodeURIComponent(doc.title[1]);
} catch (e) {
title = doc.title[0];

Let us know the outcome !

Regards,

Cedric

Serge F

unread,
Jun 24, 2020, 10:07:25 AM6/24/20
to Datafari
Thanks it works but now the non-wiki document does not show the filetitle anymore (docx, ppt, ...)
Is it possible to insert a new condition only for wiki docs ?

Serge F

unread,
Jun 24, 2020, 1:18:08 PM6/24/20
to Datafari
Cedric,

You can mark this issue SOLVED ;)

Right after line 135 :  title = decodeURIComponent(doc.title[0]);

I've just added this new condition:
if (title.indexOf('docname') !== -1) { title = decodeURIComponent(doc.title[1]);} // hack for wiki docname issue


Thanks a lot for your help and for pointing me this SubClassResult.widget.js file !

Cedric Ulmer

unread,
Jun 24, 2020, 3:57:35 PM6/24/20
to Datafari
That's great news,

we hope Datafari will be a good fit for your needs ! Don't hesitate to share with the community about your experience.

Regards,

Cedric

Reply all
Reply to author
Forward
0 new messages