Unclear behaviour of formencode.Schema with UnicodeString items

29 views
Skip to first unread message

Maxim Avanov

unread,
Feb 8, 2011, 1:25:21 PM2/8/11
to pylons-discuss
Here's an example.

# =====================
from formencode import Schema, Invalid
from formencode.validators import UnicodeString, Int
from webob import Request

class StrictSchema(Schema):
allow_extra_fields = False

class IntegerTestSchema(StrictSchema):
testfield = Int(not_empty=True)

class StringTestSchema(StrictSchema):
testfield = UnicodeString(not_empty=True)

# Testing.
# =====================
req = Request.blank('/?testfield=111')
print IntegerTestSchema.to_python(req.params)

# This raises an exception
req = Request.blank('/?testfield=111&testfield=222')
try:
IntegerTestSchema.to_python(req.params)
except Invalid as e:
print "Caught Exception: {0}".format(e)

req = Request.blank('/?testfield=aaa')
print StringTestSchema.to_python(req.params)

# This will be passed successfully (!)
# The output will be {'testfield': u"[u'aaa', u'bbb']"}
req = Request.blank('/?testfield=aaa&testfield=bbb')
print StringTestSchema.to_python(req.params)

# ========================

Please note we do not use formencode.ForEach() or formencode.Set()
here. I think this is very unclear behaviour.
Imagine an UsernameValidator (or something related to "not-so-strict-
string-validator"). Instead of indicating an input error, we show the
service realization details to our users -- "{'username': u"[u'John',
u'Mike']"}" - "Ok. This is Python list inside the dict".

According to WebOb documentation (http://pythonpaste.org/webob/
#multidict), we probably should use request.GET.getone() instead of
request.GET.getall().

Ian Wilson

unread,
Feb 8, 2011, 10:38:13 PM2/8/11
to pylons-...@googlegroups.com
Hi,

I think this behavior happens because mixed is used here https://bitbucket.org/ianb/formencode/src/d95237b33f3c/formencode/api.py#cl-403.  If you don't want that to happen ever then I think you need to cast params to a regular dictionary, with something like dict(request.params.items()).  This will silently ignore one of the names though which might be worse. 

If you want to be EXTRA strict then you could try ConfirmType and UnicodeString combined in an All validator to catch this error.  Or something to that effect.

Also note that someone could actually send ?username=[u'John', u'Mike'] in the query which would exhibit similar behavior.  So as far as I can tell if that is a problem you'd need to validate it either way.

I agree that this might be misleading but its a difficult problem to solve. If we don't use mixed then how do we get the multiple values when we want them?  I think formencode might just need better internal integration with multiple value dictionaries so that different types don't show up depending on the input.  It tries to be input agnostic though.

-Ian



--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-...@googlegroups.com.
To unsubscribe from this group, send email to pylons-discus...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en.


Maxim Avanov

unread,
Feb 10, 2011, 5:52:18 PM2/10/11
to pylons-discuss
Hi, Ian. Thanks for reply.

>If you want to be EXTRA strict then you could try ConfirmType and UnicodeString combined in an All validator to catch this error. Or something to that effect.

This solution will work, but I wouldn't like to use it for several
reasons. First of all, we already have a huge code base that
intensively uses UnicodeStrings. Surely, we could define our own
UnicodeString validator like below,

UnicodeString =
formencode.All(formencode.ConfirmType(subclass=unicode),
formencode.UnicodeString())

but we trying to keep our design clean. Moreover, "All" and
"ConfirmType" require extra validation steps (and hence more function
calls). I think it has to be solved in more generic and concise way
(i.e. in formencode's internal api level).

> Also note that someone could actually send ?username=[u'John', u'Mike'] in
> the query which would exhibit similar behavior.

Yes, and it is the place where the inconsistency of validators
behaviour comes. if we'd have an unified behaviour for all single-
value validators (Int, UnicodeString, Bool etc.) we could get
"[u'John', u'Mike']" result only for "/?username=[u'John', u'Mike']"
request. But now, we can get the same result by the two different
requests - by "/?username=[u'John', u'Mike']" and by "/?
username=John&username=Mike". This shouldn't be allowed. And it's not
allowed for all single-value validators except the UnicodeString.

> If we don't use mixed then how do we get the multiple values when we want
> them?

We might explicitly specify our expectations with ForEach() and Set()
validators.
formencode's FancyValidator could internally test currently running
validator by calling something like

isinstance(current_validator, ForEach)

and then perform appropriate actions for this case (i.e. call single-
value validator for each found item with the same key).

P.S. I hope Ian Bicking will see this topic and give his opinion all
about this, as I might miss something important here.


On Feb 9, 6:38 am, Ian Wilson <ianjosephwil...@gmail.com> wrote:
> Hi,
>
> I think this behavior happens because mixed is used herehttps://bitbucket.org/ianb/formencode/src/d95237b33f3c/formencode/api....
> If you don't want that to happen ever then I think you need to cast params
> to a regular dictionary, with something like dict(request.params.items()).
> This will silently ignore one of the names though which might be worse.
>
> If you want to be EXTRA strict then you could try ConfirmType and
> UnicodeString combined in an All validator to catch this error.  Or
> something to that effect.
>
> Also note that someone could actually send ?username=[u'John', u'Mike'] in
> the query which would exhibit similar behavior.  So as far as I can tell if
> that is a problem you'd need to validate it either way.
>
> I agree that this might be misleading but its a difficult problem to solve.
> If we don't use mixed then how do we get the multiple values when we want
> them?  I think formencode might just need better internal integration with
> multiple value dictionaries so that different types don't show up depending
> on the input.  It tries to be input agnostic though.
>
> -Ian
>

Ian Wilson

unread,
Feb 10, 2011, 9:10:19 PM2/10/11
to pylons-...@googlegroups.com
After thinking about this more it would probably be better if UnicodeString just had a parameter that turned coercion of everything but instances of basestring and None off.  Or maybe another validator was created that did that.  It makes more sense for the validator that is assigned that key to respond to both lists and non-lists.

What outcome do you actually want?  Only the first or last thing from the list to get validated? Or an error if there is more than one thing ?

Here is a validator I hacked together from the String and UnicodeString validators that only accepts None and basestrings.  None and non-unicode strings are both converted to unicode.

class StrictUnicodeString(FancyValidator):
    """"""

    min = None
    max = None
    not_empty = None
    convert_none = True
    encoding = 'utf-8'

    messages = {
        'notString': "Please enter a string",
        'tooLong': "Enter a value less than %(max)i characters long",
        'tooShort': "Enter a value %(min)i characters long or more",
        'badEncoding' : "Invalid data or incorrect encoding",
    }

    def __initargs__(self, new_attrs):
        if self.not_empty is None and self.min:
            self.not_empty = True

    def __init__(self, input_encoding=None, output_encoding=None, 
            convert_none=True, **kw):
        FancyValidator.__init__(self, **kw)
        self.input_encoding = input_encoding or self.encoding
        self.output_encoding = output_encoding or self.encoding
        self.convert_none = convert_none

    def _to_python(self, value, state):
        """ Converts to unicode. """
        if self.convert_none and value is None:
            value = u''
        if not isinstance(value, basestring):
            raise Invalid(self.message('notString', state), value, state)
        if not isinstance(value, unicode):
            try:
                value = unicode(value, self.input_encoding)
            except UnicodeDecodeError:
                raise Invalid(self.message('badEncoding', state), value, state)
        return value

    def _from_python(self, value, state):
        """ Converts to a bytestring. """
        if not isinstance(value, unicode):
            if hasattr(value, '__unicode__'):
                value = unicode(value)
            else:
                value = str(value)
        if isinstance(value, unicode):
            value = value.encode(self.output_encoding)
        return value

    def validate_other(self, value, state):
        if (self.max is not None and value is not None
            and len(value) > self.max):
            raise Invalid(self.message('tooLong', state,
                                       max=self.max),
                          value, state)
        if (self.min is not None
            and (not value or len(value) < self.min)):
            raise Invalid(self.message('tooShort', state,
                                       min=self.min),
                          value, state)

    def empty_value(self, value):
        return u''

-Ian

Maxim Avanov

unread,
Feb 11, 2011, 12:53:01 PM2/11/11
to pylons-discuss
Thanks for the snippet above, Ian.

> What outcome do you actually want? Only the first or last thing from the
> list to get validated? Or an error if there is more than one thing ?

I think the validator must raise an exception in such a case.

I'm going to create a ticket in the formencode's tracker. Do you have
any additional suggestions that should be mentioned in it?

Ian Wilson

unread,
Feb 12, 2011, 6:46:49 PM2/12/11
to pylons-...@googlegroups.com
Not really, I think you have a point.  If it is inconvenient to extend the existing string validators then the addition of a new more web form friendly string validator sounds like a good idea.
Reply all
Reply to author
Forward
0 new messages