My colleagues and I have been using Flatland for API validation, and
while it's the best thing that I've seen for doing hierarchical
validation of JSON-based APIs, it's been a little slow with somewhat
deep nests and large documents. Jason's said he's spent no time
optimizing Flatland, so here's a starting place to see where we might
be able to find improvement.
Here's a sample that looks kind of like what I've been doing:
import flatland as fl
EmailForm = fl.Dict.named('emailform').of(
fl.List.named('addresses').of(
fl.String.named('email').validated_by(fl.validation.IsEmail())
).validated_by(fl.validation.HasAtLeast(minimum=1)),
fl.Dict.named('message').of(
fl.String.named('subject'),
fl.String.named('body'),
),
)
BulkEmailForm = fl.List.named('bulkemailform').of(EmailForm)
emails = [("
john%0...@example.com" % i) for i in range(5000)]
input = [{
"addresses": [email],
"message": {
"subject": "Hello from flatland",
"body": "Please to be fast okay",
},
} for email in emails]
import cProfile
cProfile.run('f = BulkEmailForm(input);f.validate()',
sort='cumulative')
Here's the top of the profile results:
1255845 function calls (1150845 primitive calls) in 1.385 CPU
seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.002 0.002 1.385 1.385 <string>:1(<module>)
10001/1 0.018 0.000 0.841 0.841 containers.py:
175(__init__)
60001/1 0.133 0.000 0.841 0.841 base.py:129(__init__)
5001/1 0.038 0.000 0.841 0.841 containers.py:227(set)
10000/5000 0.097 0.000 0.573 0.000 containers.py:973(set)
1 0.139 0.139 0.542 0.542 base.py:767(validate)
25000/15000 0.156 0.000 0.335 0.000 containers.py:
766(_reset)
30001 0.044 0.000 0.290 0.000 base.py:
889(validate_element)
15000/10000 0.032 0.000 0.248 0.000 containers.py:
738(__init__)
10000 0.008 0.000 0.205 0.000 base.py:30(__call__)
30000 0.020 0.000 0.183 0.000 base.py:834(_validate)
There's nothing unexpected or obviously off. A little more than half
is in element initialization, and most of the rest in validation. Any
ideas on where we might look for performance improvements?