Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Don't use dumpgenerator.py with API
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Federico Leva (Nemo)  
View profile  
 More options Nov 9 2012, 5:27 am
From: "Federico Leva (Nemo)" <nemow...@gmail.com>
Date: Fri, 09 Nov 2012 11:27:03 +0100
Local: Fri, Nov 9 2012 5:27 am
Subject: [WARNING] Don't use dumpgenerator.py with API
It's completely broken:
https://code.google.com/p/wikiteam/issues/detail?id=56
It will download only a fraction of the wiki, 500 pages at most per
namespace.

Let me reiterate that
https://code.google.com/p/wikiteam/issues/detail?id=44 is a very urgent
bug and we've seen no work on it in many months. We need an actual
programmer with some knowledge of python to fix it and make the script
work properly; I know there are several on this list (and elsewhere),
please please help. The last time I, as a non-coder, tried to fix a bug,
I made things worse
(https://code.google.com/p/wikiteam/issues/detail?id=26).

Only after API is implemented/fixed, I'll be able to re-archive the 4-5
thousands wikis we've recently archived on archive.org
(https://archive.org/details/wikiteam) and possibly many more. Many of
those dumps contain errors and/or are just partial because of the
script's unreliability, and wikis die on a daily basis. (So, quoting
emijrp, there IS a deadline.)

Nemo

P.s.: Cc'ing some lists out of desperation; sorry for cross-posting.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[WARNING] Don't use dumpgenerator.py with API" by Hydriz Wikipedia
Hydriz Wikipedia  
View profile  
 More options Nov 9 2012, 10:21 am
From: Hydriz Wikipedia <ad...@alphacorp.tk>
Date: Fri, 9 Nov 2012 23:21:41 +0800
Local: Fri, Nov 9 2012 10:21 am
Subject: Re: [wikiteam-discuss:599] [WARNING] Don't use dumpgenerator.py with API

Hi all,

I am beginning work on a port to PHP due to some issues regarding unit
testing for another project of mine (if you follow me on GitHub, you will
know). I hope to help out with fixing the script, but it is a good idea to
get someone who knows python (pywikipedia-l people) and the MediaWiki API
(mediawiki-api people) to help.

On Fri, Nov 9, 2012 at 6:27 PM, Federico Leva (Nemo) <nemow...@gmail.com>wrote:

--
Regards,
Hydriz

We've created the greatest collection of shared knowledge in history. Help
protect Wikipedia. Donate now: http://donate.wikimedia.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Federico Leva (Nemo)  
View profile  
 More options Nov 9 2012, 11:49 am
From: "Federico Leva (Nemo)" <nemow...@gmail.com>
Date: Fri, 09 Nov 2012 17:48:56 +0100
Local: Fri, Nov 9 2012 11:48 am
Subject: Re: [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
Hydriz Wikipedia, 09/11/2012 16:59:

> You mentioned "a while back" for "apcontinue", show recent was it? This
> dump generator is attempting to archive all sorts of versions of
> MediaWiki, or so unless we write a backward compatibility handler in the
> script itself.

+1
https://www.mediawiki.org/wiki/API:Allpages ,
https://www.mediawiki.org/wiki/API:Lists and
https://www.mediawiki.org/wiki/API:Query#Continuing_queries don't really
shed any light.

> ...and I agree, the code is in a total mess. We need to get someone to
> rewrite the whole thing, soon.

Well, that in an ideal world. In this one, the best would probably be
suggestions for simple libraries to be used to solve such small
problems? (Which can become very big if one doesn't follow API evolution
very closely or know it's history from the beginning of time.)

Nemo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Federico Leva (Nemo)  
View profile  
 More options Nov 9 2012, 11:52 am
From: "Federico Leva (Nemo)" <nemow...@gmail.com>
Date: Fri, 09 Nov 2012 17:52:36 +0100
Local: Fri, Nov 9 2012 11:52 am
Subject: Re: [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
Brad Jorsch, 09/11/2012 17:30:

> On Fri, Nov 9, 2012 at 7:59 AM, Hydriz Wikipedia <ad...@alphacorp.tk> wrote:

>> You mentioned "a while back" for "apcontinue", show recent was it? This dump
>> generator is attempting to archive all sorts of versions of MediaWiki, or so
>> unless we write a backward compatibility handler in the script itself.

> July 2012: http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2012-July...

> Any wiki running version 1.19, or a 1.20 snapshot from before
> mid-July, would be returning the old parameter. If you do it right,
> though, there's little you have to do. Just use whichever keys are
> given you inside the <query-continue> node. Even with your regular
> expression mess, just capture which key is given as well as the value
> and use it as the key for your params dict.

Thank you again for your useful suggestions!
However, as already noted,
https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give
any info about supported releases.

Nemo

P.s.: Small unreliable "temporary" things in MediaWiki, like the
"powered by MediaWiki" sentence we grep for, are usually the most
permanent ones, although I don't like it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Federico Leva (Nemo)  
View profile  
 More options Nov 9 2012, 1:33 pm
From: "Federico Leva (Nemo)" <nemow...@gmail.com>
Date: Fri, 09 Nov 2012 19:33:17 +0100
Local: Fri, Nov 9 2012 1:33 pm
Subject: Re: [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
Brad Jorsch, 09/11/2012 18:01:

> On Fri, Nov 9, 2012 at 8:48 AM, Federico Leva (Nemo) <nemow...@gmail.com> wrote:
>> Well, that in an ideal world. In this one, the best would probably be
>> suggestions for simple libraries to be used to solve such small problems?

> Since you're using Python, pywikipedia is usually the go-to library.
> https://www.mediawiki.org/wiki/Manual:Pywikipediabot

Thank you, looks like they can indeed help us.

> On Fri, Nov 9, 2012 at 8:52 AM, Federico Leva (Nemo) <nemow...@gmail.com> wrote:
>> However, as already noted,
>> https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give any
>> info about supported releases.

> Perhaps it could be made more clear in the doc (I think I'll go fix
> that now), but clients shouldn't be depending on the particular keys
> given inside the query-continue node beyond identifying which one
> belongs to the generator.

Thank you very much for expanding the page.
Was the query-continue node the same since the beginning of the API?
It may be obvious to you but it's not written anywhere I think, please
stick a {{MW 1.12}} or whatever there if it's so.

Nemo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API" by emijrp
emijrp  
View profile  
 More options Nov 9 2012, 1:58 pm
From: emijrp <emi...@gmail.com>
Date: Fri, 9 Nov 2012 19:57:21 +0100
Local: Fri, Nov 9 2012 1:57 pm
Subject: Re: [wikiteam-discuss:604] Re: [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API

I have not read all this thread, but just fixed the issue 56. That suckers
changed "apfrom" to "apcontinue", so I added both possibilities.

Line 162
https://code.google.com/p/wikiteam/source/diff?spec=svn806&r=806&form...

2012/11/9 Federico Leva (Nemo) <nemow...@gmail.com>

--
Emilio J. Rodríguez-Posada
http://LibreFind.org - The wiki search engine

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Federico Leva (Nemo)  
View profile  
 More options Nov 9 2012, 2:32 pm
From: "Federico Leva (Nemo)" <nemow...@gmail.com>
Date: Fri, 09 Nov 2012 20:32:10 +0100
Local: Fri, Nov 9 2012 2:32 pm
Subject: Re: [wikiteam-discuss:605] Re: [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
emijrp, 09/11/2012 19:57:

> I have not read all this thread, but just fixed the issue 56. That
> suckers changed "apfrom" to "apcontinue", so I added both possibilities.

Thanks!

Nemo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[WARNING] Don't use dumpgenerator.py with API" by Scott Boyd
Scott Boyd  
View profile  
 More options Nov 9 2012, 11:46 pm
From: Scott Boyd <scottd...@gmail.com>
Date: Fri, 9 Nov 2012 22:45:58 -0600
Local: Fri, Nov 9 2012 11:45 pm
Subject: Re: [wikiteam-discuss:599] [WARNING] Don't use dumpgenerator.py with API

At this link: https://code.google.com/p/wikiteam/issues/detail?id=56 , at
the bottom, there is an entry by project member nemowiki that states:

     Comment 7 <https://code.google.com/p/wikiteam/issues/detail?id=56#c7>by
project member
nemowiki <https://code.google.com/u/101255742639286016490/>, Today (9 hours
ago)

    Fixed by emijrp in r806
<https://code.google.com/p/wikiteam/source/detail?r=806>. :-)

      *Status:* Fixed

So does that mean this problem that "It's completely broken" is now fixed?
I'm running a huge download of 64K+ page titles, and am now using the
"r806" version of dumpgenerator.py. The first 35K+ page titles were
downloaded with an older version). Both versions sure seem to be
downloading MORE than 500 pages per namespace, but I'm not sure, since I
don't know how you can tell if you are getting them all...

So is it fixed or not?

On Fri, Nov 9, 2012 at 4:27 AM, Federico Leva (Nemo) <nemow...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hydriz Wikipedia  
View profile  
 More options Nov 9 2012, 11:50 pm
From: Hydriz Wikipedia <ad...@alphacorp.tk>
Date: Sat, 10 Nov 2012 12:50:10 +0800
Local: Fri, Nov 9 2012 11:50 pm
Subject: Re: [wikiteam-discuss:608] [WARNING] Don't use dumpgenerator.py with API

Scott,

Nemo is referring to the dumpgenerator.py being broken on MediaWiki
versions above 1.20, and it should not actually affect older MediaWiki
versions.

You can safely continue with your grab. :)

--
Regards,
Hydriz

We've created the greatest collection of shared knowledge in history. Help
protect Wikipedia. Donate now: http://donate.wikimedia.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scott Boyd  
View profile  
 More options Nov 10 2012, 12:03 am
From: Scott Boyd <scottd...@gmail.com>
Date: Fri, 9 Nov 2012 23:03:54 -0600
Local: Sat, Nov 10 2012 12:03 am
Subject: Re: [wikiteam-discuss:609] [WARNING] Don't use dumpgenerator.py with API

OK - thanks for the quick reply.

Scott

On Fri, Nov 9, 2012 at 10:50 PM, Hydriz Wikipedia <ad...@alphacorp.tk>wrote:

--
Scott D. Boyd
GPS Technician - TX, LA, AR
*Professional Transportation, Inc.*
Cell: 682-465-5039

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »