Moving on to pundit-wikidata sync bot

4 views
Skip to first unread message

Amanpreet Singh

unread,
Jun 27, 2014, 1:15:46 AM6/27/14
to annotation-tool-gsoc
Dear everyone,

First of all sorry for a late update, but I thought I should update only once I have reached a significant milestone.

So I am mostly done with client and server side part of Push to Wikidata feature, so now I am moving on to creating the a bot based on PyWikibots which would fetch a pundit user's notebook and push it by creating item pages at Wikidata.

I would ask if I need any help during this session.

Thanks

--
Amanpreet Singh,
IIT Roorkee

David Cuenca

unread,
Jun 27, 2014, 4:23:41 AM6/27/14
to Amanpreet Singh, annotation-tool-gsoc
That's great, don't forget to update the progress reports:


--
You received this message because you are subscribed to the Google Groups "Annotation tool GSoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annotation-tool-...@googlegroups.com.
Visit this group at http://groups.google.com/group/annotation-tool-gsoc.
For more options, visit https://groups.google.com/d/optout.



--
Etiamsi omnes, ego non

Amanpreet Singh

unread,
Jun 29, 2014, 7:13:31 AM6/29/14
to David Cuenca, annotation-tool-gsoc
Currently I am learning about how to create the bots and I am in a dilemma, as I see, the edits the bot makes, are made in the name of the bot's account but we want specifically is to be it in the name of user who annotated, I can't find a way around is.

Also, I want to know who approves the bot requests, the process is not clearly mentioned, also I think before they(Wikidata People) accept bot requests I must create a bot account, Kindly please shed some light on these issues.

Another problem is that there are not many examples of bots other than Legotktm's bots

Thanks

David Cuenca

unread,
Jun 29, 2014, 7:30:27 AM6/29/14
to Amanpreet Singh, annotation-tool-gsoc
Maybe it is not a bot what you need, but a tool running on labs, see for instance:
It uploads a file to Commons on behalf of the user that has granted permision to the tool.

There is also an example for adding statements in Wikidata, see for example Autolists:

If you prefer to do it with a bot, then you need to create an account for the bot, but of course the edits will appear as done by the bot, which I am not sure if it is convenient.





Luca Martinelli

unread,
Jun 29, 2014, 7:52:13 AM6/29/14
to David Cuenca, annotation-tool-gsoc, Amanpreet Singh

Exactly, moreover the bot procedure on Wikidata is a bit long and complicated. Probably it will be better to make it a tool, as it is for WiDaR (sorry if I don't provide any link, but I'm from mobile now).

L.

Amanpreet Singh

unread,
Jun 29, 2014, 10:28:55 AM6/29/14
to Luca Martinelli, David Cuenca, annotation-tool-gsoc
So I think I would have to create some kind of custom class, using Wikidata API. There must be some option in API to set things to a particular person( I mean the user who annotated ).

Am I right?

Note:
1. Magnus's autolist gives an sqli error.
2. Luca, I think this is what you meant, Link to use of Widar, I will look into it.

David Cuenca

unread,
Jun 29, 2014, 11:04:03 AM6/29/14
to Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
Have you tried Autolists? If not, try it once. For instance, use this query:

That query will return the items that are in the category "Compositions by Giuseppe Verdi" but don't have the claim "<Composer> Verdi"

Then log in with Widar
And then add the statement "composer=Verdi" to all of them (P86:Q7317).

Then you go back to Wikidata and check the contributions you made with your user. You will see that the edits you made with the automated tool are marked with a Tag.

I don't know about the error, but if you don't manage to have it running you also can ask Magnus.

Cheers,
Micru


Amanpreet Singh

unread,
Jun 29, 2014, 2:15:22 PM6/29/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
I checked Autolists, I think this is exactly what we need, but I am unable to find source code for autolist2.php, but there is one for old autolists, and also WiDar is exactly what we can use for making Wikidata edits, I wish I could have known WiDar earlier, lets see more into it.

I will update soon as I make some progress.

Thanks

Amanpreet Singh

unread,
Jun 29, 2014, 2:17:13 PM6/29/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
Also I mailed Magnus about it and he has updated his repo :)

Amanpreet Singh

unread,
Jun 30, 2014, 5:44:06 AM6/30/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
I am posting the possible workflow that I have made for the entire plugin as of now.

There would be primarily 3 apps involved into this:

1. The Pundit main client side plugin configured with Wikidata Selectors for both objects and predicates.
2. An OAuth API system, exactly as WiDaR is.
3. A middle ware app that connects this API and Client side plugin on Push to Wikidata button.

I explain the workflow below in the picture:



If you want a closer go this link http://www.gliffy.com/go/publish/5918718

I would love to hear your suggestions on this.

Thanks

David Cuenca

unread,
Jun 30, 2014, 9:12:23 AM6/30/14
to Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
One thing is not clear. Is the "push to wikidata" a one time action, or does it keep the connection? Will there be a check for duplicates? 
If it is a one-time action, where would you keep track about which annotations have been transferred to Wikidata and which ones are still to be transferred?

Amanpreet Singh

unread,
Jun 30, 2014, 9:29:20 AM6/30/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
"Push to Wikidata" will be a one-time action for the current version of plugin, and the API used for pushing will always first check if the claim exists or not. So, I know there will be some inefficiency, but this can be overcome later by having some kind of storage that explicitly states which annotations have been pushed.


Thanks.

David Cuenca

unread,
Jun 30, 2014, 9:34:06 AM6/30/14
to Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
Well, another option could be to build a table like Autolist does and let the user select which statements they wants to push.

Amanpreet Singh

unread,
Jun 30, 2014, 9:37:24 AM6/30/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
That is easy to do, in the end, but I just want to make a working prototype first. I hope this workflow works fine, and we have something to work upon further :). After that, there will always be a room for improvement, and we can work on all of them one by one.

David Cuenca

unread,
Jun 30, 2014, 9:38:42 AM6/30/14
to Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
Sure, just giving ideas :)

Amanpreet Singh

unread,
Jul 2, 2014, 5:52:08 AM7/2/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
So I am able to make upto the part of retrieving annotations from Pundit server and bringing them to a proper format.
You can test upto what have been done here. There may be some problems as I haven't tested it fully. But atleast we have some working code.



For push to wikidata button see the above image and you will know, where will you find it. So now the part left is to feed to Wikidata.

Thanks
Amanpreet Singh,

Amanpreet Singh

unread,
Jul 3, 2014, 10:34:41 AM7/3/14
to David Cuenca, Luca Martinelli, annotation-tool-gsoc
Dear all,
I wanted to know your views on it, if you tried it.

Thanks

Christian Morbidoni

unread,
Jul 3, 2014, 1:00:28 PM7/3/14
to Amanpreet Singh, David Cuenca, Luca Martinelli, annotation-tool-gsoc
I tried it just now.
the pop-up was blocked by chrome. Doe it have to be a new browser window? or there are alternatives?
Then I got this error in the popup:
Warning: Unknown: write failed: No space left on device (28) in Unknown on line 0
Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) in Unknown on line 0

There is also a general thing: I perhaps want to create more annotations before pushing to wikidata, or I want to save them in pundit and then select what annotations go to wikidata. 
What do you think?
Wouldn't it better to have a push-to-wikidata in the annotation side bar? Or may be we could think of a top bar menu action to "push the entire current notebook to wikidata"?
Let's see what others think...

best,

Christian

Amanpreet Singh

unread,
Jul 3, 2014, 1:14:31 PM7/3/14
to Christian Morbidoni, David Cuenca, Luca Martinelli, annotation-tool-gsoc
Dear Christian


On Thu, Jul 3, 2014 at 10:30 PM, Christian Morbidoni <christian...@gmail.com> wrote:
> I tried it just now. 
> the pop-up was blocked by chrome. Doe it have to be a new browser window? or there are alternatives?

I think alternative is that I should open the window simply in the new tab, and I think I am good with that.
 
> Then I got this error in the popup:
> Warning: Unknown: write failed: No space left on device (28) in Unknown on line 0
> Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) in Unknown on line 0

I really don't know about this error, but will sure check into it, I think this some kind of error related with Wikimedia Servers 
 
> There is also a general thing: I perhaps want to create more annotations before pushing to wikidata, or I want to save them in pundit and then select what annotations go to wikidata. 

Actually once you are logged into Wikimedia, it do shows you option to select which annotations to push.
 
> Wouldn't it better to have a push-to-wikidata in the annotation side bar? Or may be we could think of a top bar menu action to "push the entire current notebook to wikidata"?

I think this is a great idea I would work on this.

Amanpreet Singh

unread,
Jul 8, 2014, 10:56:18 AM7/8/14
to Christian Morbidoni, David Cuenca, Luca Martinelli, annotation-tool-gsoc
Hi all,
I have a question,
Consider a situation where when user annotates and item for the triple doesn't exist on Wikidata , I mean he didn't choose one provided by Wikidata Selector, like the one you get initially on highlighting the text and clicking on Annotate this text fragment, what should I probably do in that case. Should I create a new item on Wikidata or should I give a notice to user that item doesn't exist.

Thanks 

David Cuenca

unread,
Jul 8, 2014, 1:33:00 PM7/8/14
to Amanpreet Singh, Christian Morbidoni, Luca Martinelli, annotation-tool-gsoc
Hi,
Well, I would let the user decide, "the following items do not exist in wikidata, do you want to create them?"
It can be very simple, or more elaborate, depending on your time.

Btw, a very important option is missing in the contextual menu "use selection as source". 
The whole point of the project is to feed sourced statements to Wikidata, so if there is no option to use a text or image selection as a source of the triple, then the users are not going to understand what to do with the tool.

Thanks
Micru

Simone Fonda

unread,
Jul 9, 2014, 3:26:28 AM7/9/14
to David Cuenca, Amanpreet Singh, Christian Morbidoni, Luca Martinelli, annotation-tool-gsoc
On Tue, Jul 8, 2014 at 7:32 PM, David Cuenca <dac...@gmail.com> wrote:

> Well, I would let the user decide, "the following items do not exist in wikidata, do you want to create them?"
> It can be very simple, or more elaborate, depending on your time.
>
> Btw, a very important option is missing in the contextual menu "use selection as source".
> The whole point of the project is to feed sourced statements to Wikidata, so if there is no option
> to use a text or image selection as a source of the triple, then the users are not going to understand what
> to do with the tool.

I dont get why you would want to create an item on wikidata. It is
just a text fragment, a piece of an external content, not a real
"thing" or "entity", not something that you will get additional
information for.

I would go for what David says, adding that maybe you want WM users to
use text fragments *only* as sources. In this case i would disable
every other use of fragments (annotate this .., text to text, etc) and
add David's "use selection as source".

Simone

David Cuenca

unread,
Jul 9, 2014, 4:04:37 AM7/9/14
to Simone Fonda, Amanpreet Singh, Christian Morbidoni, Luca Martinelli, annotation-tool-gsoc
On Wed, Jul 9, 2014 at 9:25 AM, Simone Fonda <fo...@netseven.it> wrote:
I dont get why you would want to create an item on wikidata. It is
just a text fragment, a piece of an external content, not a real
"thing" or "entity", not something that you will get additional
information for.

Oh, I was thinking that if for instance we try to push the triple <Luke Skywalker> father <Anakin Skywalker>, and then there is no item for "Anakin Skywalker", then it should be created.
Definitely not for text fragments. 

Micru

Christian Morbidoni

unread,
Jul 9, 2014, 6:07:09 AM7/9/14
to David Cuenca, Simone Fonda, Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
Hi all,

I think we should take a step back and recap what kind of annotations we what users to create.
The original use case in my understanding was:
I want to mark a sentence in a page and then use it as a source, a "proof" of the meaning of one or more triples that I want to ultimately feed into WikiData.
E.g. I mark the sentence "Pundit is a very nice semantic web annotation tool. It if also open-source!" and create the triples:
wd:Pundit wd:belongs-to wd:WebAnnotationToolsCategory.
wd:Pundit dc:type wd:OpenSourceSoftware.

On "save", the Annotation would translate into Pundit data model as an oa:Annotation with an bodyGraph associated. This will be a named graph as the containing the following triples:

wd:Pundit wd:belongs-to wd:WebAnnotationToolsCategory.
wd:Pundit dc:type wd:OpenSourceSoftware.

plus an itemsGraph associated with the following triples:

ex:MyAnnotation oa:hasTarget <http://example.org/Pundit page.html?xpointer(....) > .
<http://example.org/Pundit page.html?xpointer(....) > rdfs:label "Pundit is a very nice semantic web annotation tool. It if also open-source!" .

All this data can be read via Pundit Server REST API and transformed and stored at WikiData.

So I guess there is no need to create a new WikiData resource for the sentence (Unless WikiData guys have some use for that in mind). While there could be the need of adding a new WikiData item, e.g. wd:Pundit could not exists yet. Probably the best way to address this is use WikiData APIs to create a new resource and then refresh the respective vocabulary in Pundit.
@Simone: this is similar to the problem of adding a new item in KORBO, do we have any code that Aman perhaps can look at?


There is a problem tough:
In the current implementation of Pundit, in order to obtain a correct itemsGraph (with the needed triples) you have to put into the Annotation (bobyGraph) an additional triple with subject equal to <http://example.org/Pundit page.html?xpointer(....) > (that is the URI of the text fragment users-selected in Pundit). For example:
<http://example.org/Pundit page.html?xpointer(....) > wd:is-a wd:AnnotationSource

Now: 
- we can choose to put the effort on the user side: by asking them to include such triple with the triple composer.
- or we can choose to put the effort on Aman ( :-) ) by implementing a short-cut. E.g. a new actiion in the Pundit contextual menu? like "Add as Annotation Source", which could automatically add such a triple to the triple composer.

What do the other mentors think? Does it make sense?
Aman: please take some time to consider possibilities and write back what your plans would be.

best,

Christian




David Cuenca

unread,
Jul 9, 2014, 8:45:30 AM7/9/14
to Christian Morbidoni, Simone Fonda, Amanpreet Singh, Luca Martinelli, annotation-tool-gsoc
On Wed, Jul 9, 2014 at 12:07 PM, Christian Morbidoni <christian...@gmail.com> wrote:
So I guess there is no need to create a new WikiData resource for the sentence (Unless WikiData guys have some use for that in mind).

No, there is no need to create a resource for the sentence, but it would be nice to put some information in the "references" section of that triple in Wikidata.
For instance check the references for the birth date of 

It includes the link to the webpage, its title, language, etc. No need to add everything, just what can be read automatically from the website.
I also think it would be nice to add a back-link to the Pundit annotation resource, but I do not know what is the best way to do this.
 
What do the other mentors think? Does it make sense?

Everything that simplifies usability makes sense :)
 
Aman: please take some time to consider possibilities and write back what your plans would be.


Yes, also take time to add some triples to Wikidata (with sources) to get the feel of what is needed from the user perspective. You can read about what is expected when sourcing statements here

In your case only  "reference URL (P854)" and "date retrieved (P813)" might be relevant. It could be possible to create a new property to link with the resource in Pundit if needed.

Cheers,
Micru

Amanpreet Singh

unread,
Jul 9, 2014, 9:50:31 AM7/9/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Hi all,

As far as I understand the problems, here's my take on all of them. Lets consider them all one by one:

1. About ambiguous text fragments being annotated:
IMO, users are not that intuitive that you can expect them to make correct triples like the one, Christian gave in example. I mean, we should expect them to make such mistakes. So I took the following approach on it:
The following cases are in the consideration of Item (subject), for which we are annotating
    i. Check for the label, if its a Wikidata item, then its great nothing to do :) (This is the case of Christian's example)
    ii. If its not a Wikidata item, check for the Wikidata for related items to the text fragment (maybe a proper subject), and show the suggestions to the users as dropdown, giving them a choice. We will also provide an option to create item here.
    iii. if Wikidata search also doesn't return any result, probably its an awkward text fragment, we provide an option to create Wikidata Item for this (It would be probably better to redirect user to Wikidata for creating items instead of creating manually). 

What you think of the approach? Also I have implemented this approach in meantime.

2. About source of annotations:
IMO, this isn't a big issue with Pundit Open API's graph, Source of triple: Reference Url(P854) can be easily retrieved from the graph.
Once we get url its not that tough to retrieve its title(P357), while date(P813) can also be retrieved from graph. 

I also think it would be nice to add a back-link to the Pundit annotation resource, but I do not know what is the best way to do this

David, I don't exactly got what you meant by Pundit Resource, link the website or graph?

Kindly tell me if you all want to change something, or if I missed anything.

Amanpreet Singh

unread,
Jul 9, 2014, 9:53:22 AM7/9/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Also I wanted to tell you that I had to shift back from WiDaR to my own login system, as WiDaR doesn't support adding references to a claim, and some other thing also which we are going to need soon. I am grateful to Magnus for providing source code for WiDaR.
Lets see how this works out.

Amanpreet Singh

unread,
Jul 13, 2014, 4:03:31 AM7/13/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Hello all,
David and Luca,
I am having a problem while adding property P813 for https://www.wikidata.org/wiki/Q2336535:P227, I am getting an error  "Malformed input: +00000002014-07-10T11:44:38Z". Do you have any idea how to fix this?

David Cuenca

unread,
Jul 13, 2014, 4:17:49 AM7/13/14
to Amanpreet Singh, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Are you forming the value using all parameters of the time datatype? (timezone, precision, etc) 

Amanpreet Singh

unread,
Jul 13, 2014, 4:24:37 AM7/13/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Yes, you are correct json is something like this:
 '{ 
"type":"time", "value":  {"time":"+00000002014-07-10T11:44:38Z", "timezone":0, "before":0, "after":0,
"precision":14, "calendarmodel":"http://www.wikidata.org/entity/Q1985727"}
}';

Amanpreet Singh

unread,
Jul 14, 2014, 4:26:30 AM7/14/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Hi all,
I have a good news, I am finally able to add annotations with references to WIkidata :), For e.g. You can check http://www.wikidata.org/wiki/Q2336535#P277, this property has been added through the tool, you can also check the revision history to check how it shows Apsdehal made the entry with a bot tag against it.

I will soon set all this up for testing. :D

Amanpreet Singh

unread,
Jul 14, 2014, 4:38:00 AM7/14/14
to David Cuenca, Christian Morbidoni, Simone Fonda, Luca Martinelli, annotation-tool-gsoc
Also David, error was due to precision matters. Currently Wikidata doesn't allow precision above 11 that is day. I hope this will be soon fixed.
Added a bug for this https://bugzilla.wikimedia.org/show_bug.cgi?id=67975
Reply all
Reply to author
Forward
0 new messages