AttributeError for 2% of German pages?

16 views
Skip to first unread message

Sven

unread,
Dec 14, 2011, 3:14:46 AM12/14/11
to mw...@googlegroups.com
Hi.

I am running mw-lib on a 150,000 page sample of the German Wikipedia.
Nice results, thanks a million!

Is an error rate (almost all of the AttributeError type reported below) of 2% expected
or am I messing up things?

...
mw-render -c de/wikiconf.txt -o xhtml/193 -w xhtml "Apple"
...

 0% parsing Apple
0% error Apple
Traceback (most recent call last):
  File "/usr/local/bin/mw-render", line 9, in <module>
    load_entry_point('mwlib==0.12.17', 'console_scripts', 'mw-render')()
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/apps/render.py", line 218, in main
    return Main()()
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/apps/render.py", line 181, in __call__
    writer(env, output=tmpout, status_callback=self.status, **writer_options)
  File "/usr/local/lib/python2.7/site-packages/mwlib.xhtml-0.1.0-py2.7.egg/mwlib/xhtmlwriter.py", line 708, in xhtmlwriter
    book = writerbase.build_book(env, status_callback=buildbook_status)
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/writerbase.py", line 43, in build_book
    a = wiki.getParsedArticle(title=item.title, revision=item.revision)
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/nuwiki.py", line 412, in getParsedArticle
    return uparser.parseString(title=title, raw=raw, wikidb=self, lang=self.siteinfo["general"]["lang"])
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/refine/uparser.py", line 34, in parseString
    input = te.expandTemplates(True)
  File "evaluate.py", line 295, in mwlib.templ.evaluate.Expander.expandTemplates (mwlib/templ/evaluate.c:5930)
  File "evaluate.py", line 282, in mwlib.templ.evaluate.Expander._expand (mwlib/templ/evaluate.c:5567)
  File "evaluate.py", line 28, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1103)
  File "evaluate.py", line 30, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1134)
  File "nodes.py", line 210, in mwlib.templ.nodes.Template.flatten (mwlib/templ/nodes.c:4619)
  File "nodes.py", line 291, in mwlib.templ.nodes.Template._flatten (mwlib/templ/nodes.c:5976)
  File "evaluate.py", line 28, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1103)
  File "evaluate.py", line 30, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1134)
  File "nodes.py", line 41, in mwlib.templ.nodes.IfNode.flatten (mwlib/templ/nodes.c:1644)
  File "evaluate.py", line 28, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1103)
  File "evaluate.py", line 30, in mwlib.templ.evaluate.flatten (mwlib/templ/evaluate.c:1134)
  File "nodes.py", line 210, in mwlib.templ.nodes.Template.flatten (mwlib/templ/nodes.c:4619)
  File "nodes.py", line 277, in mwlib.templ.nodes.Template._flatten (mwlib/templ/nodes.c:5777)
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/templ/magics.py", line 563, in __call__
    res = m(args) or ''  # FIXME: catch TypeErros
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/templ/magics.py", line 444, in IFEXIST
    exists = bool(self.wikidb.normalize_and_get_image_path(name.split(":")[1]))
  File "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/nuwiki.py", line 330, in __getattr__
    raise AttributeError()
AttributeError

Sven

Ralf Schmitt

unread,
Dec 14, 2011, 5:52:18 AM12/14/11
to mw...@googlegroups.com
Sven <hart...@gmail.com> writes:

> Hi.
>
> I am running mw-lib on a 150,000 page sample of the German Wikipedia.
> Nice results, thanks a million!
>
> Is an error rate (almost all of the AttributeError type reported below) of
> 2% expected
> or am I messing up things?

I'm pretty sure it's a bug, not a feature.


>
> ...
> mw-render -c de/wikiconf.txt -o xhtml/193 -w xhtml "Apple"
> ...
>

> "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/templ/magics.py",
> line 444, in IFEXIST
> exists =
> bool(self.wikidb.normalize_and_get_image_path(name.split(":")[1]))

it looks like your wikidb doesn't have a normalize_and_get_image_path
attribute. Are you using cdbwiki?

Sven Hartrumpf

unread,
Dec 14, 2011, 8:20:16 AM12/14/11
to mw...@googlegroups.com
Hi Ralf.

Wed, 14 Dec 2011 11:52:18 +0100, ralf wrote:
>> Nice results, thanks a million!
>>
>> Is an error rate (almost all of the AttributeError type reported below) of
>> 2% expected
>> or am I messing up things?
>
> I'm pretty sure it's a bug, not a feature.

Good news ...

>> ...
>> mw-render -c de/wikiconf.txt -o xhtml/193 -w xhtml "Apple"
>> ...
>>
>> "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/templ/magics.py",
>> line 444, in IFEXIST
>> exists =
>> bool(self.wikidb.normalize_and_get_image_path(name.split(":")[1]))
>
> it looks like your wikidb doesn't have a normalize_and_get_image_path
> attribute. Are you using cdbwiki?

Yes.

Sven

Sven Hartrumpf

unread,
Dec 15, 2011, 4:43:21 AM12/15/11
to mw...@googlegroups.com
Wed, 14 Dec 2011 14:20:16 +0100 (CET), hartrumpf wrote:
>>> mw-render -c de/wikiconf.txt -o xhtml/193 -w xhtml "Apple"
>>> ...
>>>
>>> "/usr/local/lib/python2.7/site-packages/mwlib-0.12.17-py2.7-linux-x86_64.egg/mwlib/templ/magics.py",
>>> line 444, in IFEXIST
>>> exists =
>>> bool(self.wikidb.normalize_and_get_image_path(name.split(":")[1]))
>>
>> it looks like your wikidb doesn't have a normalize_and_get_image_path
>> attribute. Are you using cdbwiki?
>
> Yes.

Sorry for my ignorance:
What kind of solution does your question and my answer imply? :-)

Ralf Schmitt

unread,
Dec 15, 2011, 5:14:47 AM12/15/11
to mw...@googlegroups.com
Sven Hartrumpf <hart...@gmail.com> writes:

>>> it looks like your wikidb doesn't have a normalize_and_get_image_path
>>> attribute. Are you using cdbwiki?
>>
>> Yes.
>
> Sorry for my ignorance:
> What kind of solution does your question and my answer imply? :-)

Can you provide a patch? We (pediapress) don't use it anymore. I nearly
removed cdbwiki some time ago.

--
Cheers
Ralf

Sven Hartrumpf

unread,
Dec 15, 2011, 12:03:09 PM12/15/11
to mw...@googlegroups.com
Thu, 15 Dec 2011 11:14:47 +0100, ralf wrote:

> We (pediapress) don't use it anymore. I nearly
> removed cdbwiki some time ago.

Oh, I see.
What other backend is recommended, especially for
batch processing of large Wikipedia dumps?

Sven

Sven

unread,
Dec 19, 2011, 6:57:05 AM12/19/11
to mw...@googlegroups.com
> Can you provide a patch?

Sorry, I don't speak much Python :-(

If I don't need any images, can anybody think of a work-around
to get any text from pages triggering this error?

Reply all
Reply to author
Forward
0 new messages