Flatland performance for large documents

10 views
Skip to first unread message

Adam Lowry

unread,
Feb 26, 2011, 2:32:09 PM2/26/11
to discorporate-tools
My colleagues and I have been using Flatland for API validation, and
while it's the best thing that I've seen for doing hierarchical
validation of JSON-based APIs, it's been a little slow with somewhat
deep nests and large documents. Jason's said he's spent no time
optimizing Flatland, so here's a starting place to see where we might
be able to find improvement.

Here's a sample that looks kind of like what I've been doing:

import flatland as fl

EmailForm = fl.Dict.named('emailform').of(
fl.List.named('addresses').of(
fl.String.named('email').validated_by(fl.validation.IsEmail())
).validated_by(fl.validation.HasAtLeast(minimum=1)),
fl.Dict.named('message').of(
fl.String.named('subject'),
fl.String.named('body'),
),
)

BulkEmailForm = fl.List.named('bulkemailform').of(EmailForm)

emails = [("john%0...@example.com" % i) for i in range(5000)]

input = [{
"addresses": [email],
"message": {
"subject": "Hello from flatland",
"body": "Please to be fast okay",
},
} for email in emails]


import cProfile
cProfile.run('f = BulkEmailForm(input);f.validate()',
sort='cumulative')

Here's the top of the profile results:
1255845 function calls (1150845 primitive calls) in 1.385 CPU
seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.002 0.002 1.385 1.385 <string>:1(<module>)
10001/1 0.018 0.000 0.841 0.841 containers.py:
175(__init__)
60001/1 0.133 0.000 0.841 0.841 base.py:129(__init__)
5001/1 0.038 0.000 0.841 0.841 containers.py:227(set)
10000/5000 0.097 0.000 0.573 0.000 containers.py:973(set)
1 0.139 0.139 0.542 0.542 base.py:767(validate)
25000/15000 0.156 0.000 0.335 0.000 containers.py:
766(_reset)
30001 0.044 0.000 0.290 0.000 base.py:
889(validate_element)
15000/10000 0.032 0.000 0.248 0.000 containers.py:
738(__init__)
10000 0.008 0.000 0.205 0.000 base.py:30(__call__)
30000 0.020 0.000 0.183 0.000 base.py:834(_validate)


There's nothing unexpected or obviously off. A little more than half
is in element initialization, and most of the rest in validation. Any
ideas on where we might look for performance improvements?
Reply all
Reply to author
Forward
0 new messages