[Plone-Users] Archetypes field getIndexable unicode error

9 views
Skip to first unread message

Matt Halstead

unread,
Jan 22, 2008, 7:01:30 AM1/22/08
to plone...@lists.sourceforge.net
Hi,

I have a content type derived from RichDocument, which in turn derives
its body 'text' field from ATDocument. By default this field is marked
as searchable. If this field is set to a unicode object that contains
unicode characters that in ascii are outside the 128 mark, then
reindexing/indexing the object will now fail. More precisely:

... INFO Archetypes Error while trying to convert file contents to
'text/plain' in <Field text(text:rw)>.getIndexable() of <...>: 'ascii'
codec can't encode character u'\u2019' in position 944: ordinal not in
range(128)

This is raised by getIndexable in Archetypes/Field.py by the
line(1192):

str(f), in the call of the method

datastream = transforms.convertTo(
"text/plain",
str(f),
mimetype = orig_mt,
filename = self.getFilename(instance, 0),
)

The symptom is becoming quite prevalent since I have someone copying
and pasting text from microsoft word documents into the kupu edit box
for the field. The characters are mostly in the punctuation area of
the unicode table, e.g. : \u2019, \u2013, \u2026

The error can be repeated in a doctest with the following:

>>> self.loginAsPortalOwner()
>>> _ = self.portal.invokeFactory('Folder','folder1')
>>> folder1 = self.portal.folder1
>>> _ = folder1.invokeFactory('Document','article1',
title="Article 1")
>>> article1 = folder1.article1
>>> article1.setText(u"This is unicode: \u2018 \u2013 \u2019")
>>> article1.reindexObject()

NOTE: you need to put a pdb stop in the exception catcher in Field.py,
otherwise it will swallow the error and plone test case won't show you
the logged offence ... i.e. you need to put a stop in here (line 1198
of Field):

except Exception, e:
import pdb
pdb.set_trace()
log("Error while trying to convert file contents to 'text/
plain' "
"in %r.getIndexable() of %r: %s" % (self, instance,
e))


So, my actual question is: Is this a bug or a feature?

I understand that ZCatalog ZCTextIndexes are unicode aware, so I'm not
sure why there is this str(f) conversion.

Matt


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Plone-Users mailing list
Plone...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plone-users

Matt Halstead

unread,
Jan 22, 2008, 8:21:07 AM1/22/08
to plone...@lists.sourceforge.net
A followup to this.

I am using plone 3.0.4 and kupu 1.4.7

It seems that the symptom I am seeing only turns up once the Link
using UIDs setting in the kupu config is enabled. If you disable it,
then the error I am seeing no longer occurs.

Matt

> Defy all challenges. Microsoft(R) Visual Studio 2008.http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Plone-Users mailing list
> Plone-Us...@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/plone-users

Reply all
Reply to author
Forward
0 new messages