Json and the spacing issue

76 views
Skip to first unread message

Alan Etkin

unread,
Jan 31, 2013, 6:36:39 AM1/31/13
to web2py-d...@googlegroups.com
https://groups.google.com/d/msg/web2py/-K2F1AwgbYE/-86tA7jnYKEJ

What about this

from serializers import json
>>> dumped = json(<data>, clean_js=True)

serializers.json is a wrapper for the json encoder. Then look in gluon for any encoder call to fix the code. Does it makes sense?

Massimo Di Pierro

unread,
Jan 31, 2013, 9:18:34 AM1/31/13
to web2py-d...@googlegroups.com
Why make the clean_js an option. Should't it always be true?



--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Alan Etkin

unread,
Jan 31, 2013, 9:53:39 AM1/31/13
to web2py-d...@googlegroups.com
Why make the clean_js an option. Should't it always be true?

Good point. The replacement should be always made for preventing the incompatible chars. It's just that I cannot figure how the custom input that the user wants stored "as is" is possible (bypassing the replacement).

Alan Etkin

unread,
Feb 1, 2013, 7:32:18 AM2/1/13
to
Good point. The replacement should be always made for preventing the incompatible chars.
They simply call replace for the serializer output. For web2py, the .replace commands should be added to gluon.serializers.json. Note: some sections of gluon use dumps directly (sometimes because serializers is not available:

Dumps calls

./dal.py:1809:                    obj = simplejson.dumps(items)
./dal.py:6843:                return simplejson.dumps(item)
./dal.py:10057:            return simplejson.dumps(items)

./contrib/spreadsheet.py:374:            return simplejson.dumps(result)
./contrib/spreadsheet.py:843:            """ % dict(data=simplejson.dumps(self.client),

./contrib/simplejsonrpc.py:115:        request = json.dumps(data)

./scheduler.py:229:            result = dumps(_function(*args, **vars))
./scheduler.py:916:        targs = 'args' in kwargs and kwargs.pop('args') or dumps(pargs)
./scheduler.py:917:        tvars = 'vars' in kwargs and kwargs.pop('vars') or dumps(pvars)

./validators.py:341:        return simplejson.dumps(value)

Massimo Di Pierro

unread,
Feb 1, 2013, 10:37:21 AM2/1/13
to web2py-d...@googlegroups.com
The issue here is that dal.py, contrib/* and scheduler.py are supposed to work standalone. 

Perhaps we should fix the replace in our version of simplejson?


On Feb 1, 2013, at 6:19 AM, Alan Etkin wrote:

Good point. The replacement should be always made for preventing the incompatible chars.
They simply call replace for the serializer output. For web2py, the .replace commands should be added to gluon.serializers.json. Note: some sections of gluon use dumps directly (sometimes because serializers is not available:

Dumps calls

./dal.py:1809:                    obj = simplejson.dumps(items)
./dal.py:6843:                return simplejson.dumps(item)
./dal.py:10057:            return simplejson.dumps(items)
./dal.py~:1809:                    obj = simplejson.dumps(items)
./dal.py~:6828:                return simplejson.dumps(item)
./dal.py~:10042:            return simplejson.dumps(items)
./dal.py.original~:1794:                    obj = simplejson.dumps(items)
./dal.py.original~:6717:            return simplejson.dumps(item)
./dal.py.original~:9665:            return simplejson.dumps(items)
./dal.py.row.py~:6599:            return simplejson.dumps(items)
./dal.py.row.py~:9513:            return simplejson.dumps(items)


./contrib/spreadsheet.py:374:            return simplejson.dumps(result)
./contrib/spreadsheet.py:843:            """ % dict(data=simplejson.dumps(self.client),

./contrib/simplejsonrpc.py:115:        request = json.dumps(data)

./scheduler.py:229:            result = dumps(_function(*args, **vars))
./scheduler.py:916:        targs = 'args' in kwargs and kwargs.pop('args') or dumps(pargs)
./scheduler.py:917:        tvars = 'vars' in kwargs and kwargs.pop('vars') or dumps(pvars)

./validators.py:341:        return simplejson.dumps(value)


Jonathan Lundell

unread,
Feb 1, 2013, 10:47:52 AM2/1/13
to web2py-d...@googlegroups.com
On 1 Feb 2013, at 7:37 AM, Massimo Di Pierro <massimo....@gmail.com> wrote:
The issue here is that dal.py, contrib/* and scheduler.py are supposed to work standalone. 

Perhaps we should fix the replace in our version of simplejson?

Except that gluon.serializers prefers json if available.

Alan Etkin

unread,
Feb 1, 2013, 11:21:39 AM2/1/13
to web2py-d...@googlegroups.com
Perhaps we should fix the replace in our version of simplejson?
For me, it's just about calling  .replace after serialization inside serializers.json
It would be safer to do regex replacement though, because I belive that pre-escaped chars would be over-escaped.

Aso, scheduler doesn't need to prevent the problem since it just handles JSON data, not javascript.

Not sure about the rest of the contrib modules (jsonrpc and spreadsheet, if they parse javascript strings, maybe the best is to have this fixed by its authors, if possible).

Massimo Di Pierro

unread,
Feb 1, 2013, 11:53:17 AM2/1/13
to web2py-d...@googlegroups.com
I agree.

Alan Etkin

unread,
Feb 1, 2013, 12:07:30 PM2/1/13
to web2py-d...@googlegroups.com
I found a bug in my last post :/

>>> u2
u'\\u2028'
>>> u3 = u2.replace(u'\u2028', u'\\u2028')
>>> u3
u'\\u2028'

So I cannot reproduce the supposed over-escaping problem. Need for a Python guru here.

Massimo Di Pierro

unread,
Feb 1, 2013, 12:20:40 PM2/1/13
to web2py-d...@googlegroups.com
I do not think there can be a over escaping problem. At the python level you are replacing a non-ascii for an ascii sequence. A second call would not find the non-ascii sequence.


Jonathan Lundell

unread,
Feb 1, 2013, 12:21:01 PM2/1/13
to web2py-d...@googlegroups.com, Alan Etkin
What's the problem exactly? Your replace call is replacing a string with a single unicode character (u'\u2028') with a string with six unicode characters (u'\\u2028'), none of which are \u2028 (they are '\', 'u', '2', '0', '2', '8').

You could also write: replace(u'\u2028', ur'\u2028')

>>> len(u'\\u2028')
6
>>> len(u'\u2028')
1
>>> len(ur'\u2028')
1


Jonathan Lundell

unread,
Feb 1, 2013, 12:32:13 PM2/1/13
to web2py-d...@googlegroups.com
Oops, that should have been: replace(u'\u2028', r'\u2028')

>>> len(r'\u2028')
6

Alan Etkin

unread,
Feb 1, 2013, 12:38:09 PM2/1/13
to web2py-d...@googlegroups.com
Surely I need to try the Python chapter on strings and unicode again...


> What's the problem exactly? Your replace call is replacing a string with a single unicode character (u'\u2028') with a string with six
> unicode characters (u'\\u2028'), none of which are \u2028 (they are '\', 'u', '2', '0', '2', '8').

Thanks a lot. So I guess for this particular issue, It's enough to just use .replace(<unicode char>, <other>) for 2028 and 2029.

Jonathan Lundell

unread,
Feb 1, 2013, 12:47:01 PM2/1/13
to web2py-d...@googlegroups.com
On 1 Feb 2013, at 9:38 AM, Alan Etkin <spam...@gmail.com> wrote:
Surely I need to try the Python chapter on strings and unicode again...

Each time you read it, it makes more sense.


> What's the problem exactly? Your replace call is replacing a string with a single unicode character (u'\u2028') with a string with six 
> unicode characters (u'\\u2028'), none of which are \u2028 (they are '\', 'u', '2', '0', '2', '8').

Thanks a lot. So I guess for this particular issue, It's enough to just use .replace(<unicode char>, <other>) for 2028 and 2029.

I think so, yes.

It's possible that a (compiled) regex would be faster on long strings (which JSON strings are likely to be), since it'd mean one instead of two passes over the string. But a) you'd have to measure it first, and b) I doubt the difference would be meaningful in the context of a request.

Massimo Di Pierro

unread,
Feb 1, 2013, 3:08:37 PM2/1/13
to web2py-d...@googlegroups.com
I cannot quote the source but I read that string replacement is faster than regex.

Massimo

Jonathan Lundell

unread,
Feb 1, 2013, 3:42:13 PM2/1/13
to web2py-d...@googlegroups.com
On 1 Feb 2013, at 12:08 PM, Massimo Di Pierro <massimo....@gmail.com> wrote:
I cannot quote the source but I read that string replacement is faster than regex.

I'm sure it is, but I wonder if it's twice as fast...

I don't think it matters in this case.

Alan Etkin

unread,
Feb 1, 2013, 4:32:51 PM2/1/13
to web2py-d...@googlegroups.com
The problem is with apps with intensive demand of json data

My benchmarks:

#### Text replace comparison (regex/no regex) ####
Text original length is 238714
Testing replacement speed with regex...
Result length: 256054 Elapsed: 0:00:00.035100
Testing replacement speed without regex...
Result length: 256054 Elapsed: 0:00:00.008104
Exact string match: True
data types 1 <type 'unicode'> 2 <type 'unicode'>

Attached is the script used.

replace.py

Massimo Di Pierro

unread,
Feb 1, 2013, 9:46:52 PM2/1/13
to web2py-d...@googlegroups.com

wow Alan! This settles it.

Alan Etkin

unread,
Feb 2, 2013, 5:36:54 AM2/2/13
to web2py-d...@googlegroups.com

wow Alan! This settles it.

Just a few notes on this measure:

- re.sub is called without .compile and passes a function to do the replacement. Perhaps that options make the process slower
- Calling compile for each character and then .sub (no replacement function) the difference is smaller:


Testing replacement speed with regex...
Result length: 256054 Elapsed: 0:00:00.019647

Testing replacement speed without regex...
Result length: 256054 Elapsed: 0:00:00.007614

I could not find a way of replacing the two characters with one pattern. I think that measures would be very similar in that case.
Reply all
Reply to author
Forward
0 new messages