Sorry all. This is likely to be a biggish post (and may well turn out to be unreadable, if the tables dont reproduce well ;-)
I would certainly appreciate someone commenting on the logic of my approach.
OK - heres what I have done so far, using the Reia PEG as a test case....
It seemed that fprof was the best tool to use, but it measures elapsed time, not CPU. I could not find a tool to give me processor time used. :-( (Is that really so hard to do in a VM?!?)
As a baseline, I used the last version of the memory branch that I posted. The top few lines of the result were as follows:
"<0.39.0>" |
|
|
1940718 |
undefined |
52034.028 |
|
|
|
13.2247 |
|
|
|
Count |
Tot |
Own |
Unit |
%ageTotal |
|
|
neotoma_parse |
p_all |
4 |
138932 |
72683.399 |
3934.601 |
0.02832 |
7.56% |
|
1.0000 |
neotoma_parse |
p_attempt |
4 |
130960 |
69777.613 |
3884.261 |
0.02966 |
7.46% |
|
0.9872 |
neotoma_parse |
p |
5 |
51709 |
73383.129 |
3834.867 |
0.07416 |
7.37% |
|
0.9747 |
neotoma_parse |
get_memo |
1 |
51709 |
5872.097 |
2540.128 |
0.04912 |
4.88% |
|
0.6456 |
lists |
foldl |
3 |
95022 |
5179.214 |
2277.290 |
0.02397 |
4.38% |
|
0.5788 |
neotoma_parse |
-p_charclass/1-fun-0-' |
4 |
14558 |
6096.801 |
2265.258 |
0.15560 |
4.35% |
|
0.5757 |
proplists |
lookup |
2 |
137078 |
3229.624 |
2216.337 |
0.01617 |
4.26% |
|
0.5633 |
neotoma_parse |
-p_string/1-fun-0-' |
4 |
110587 |
4273.063 |
2200.807 |
0.01990 |
4.23% |
|
0.5593 |
neotoma_parse |
p_advance_index |
2 |
95022 |
10534.828 |
2080.734 |
0.02190 |
4.00% |
|
0.5288 |
neotoma_parse |
memoize |
2 |
42376 |
4736.719 |
1954.612 |
0.04613 |
3.76% |
|
0.4968 |
neotoma_parse |
p_scan |
4 |
53795 |
72608.579 |
1629.444 |
0.03029 |
3.13% |
|
0.4141 |
unicode |
characters_to_list |
1 |
47433 |
3260.534 |
1588.838 |
0.03350 |
3.05% |
|
0.4038 |
neotoma_parse |
-p_not/1-fun-0-' |
3 |
37517 |
3341.361 |
1537.576 |
0.04098 |
2.95% |
|
0.3908 |
neotoma_parse |
-p_anything/0-fun-0-' |
2 |
34785 |
9531.222 |
1458.524 |
0.04193 |
2.80% |
|
0.3707 |
neotoma_parse |
memo_table_name |
0 |
95476 |
1899.382 |
1354.767 |
0.01419 |
2.60% |
|
0.3443 |
proplists |
is_defined |
2 |
74994 |
1764.019 |
1212.677 |
0.01617 |
2.33% |
|
0.3082 |
garbage_collect |
537 |
|
1 |
1039.164 |
1039.164 |
1039.16400 |
2.00% |
|
0.2641 |
re |
run |
3 |
14558 |
1056.997 |
994.229 |
0.06829 |
1.91% |
|
0.2527 |
ets |
lookup |
2 |
52794 |
1155.080 |
950.885 |
0.01801 |
1.83% |
|
0.2417 |
ets |
insert |
2 |
42676 |
1240.016 |
877.454 |
0.02056 |
1.69% |
|
0.2230 |
For those unfamiliar with fprof, 'count' is the number of times a function is executed, Tot is the Total time spent in a function, including time in the called functions, and Own is the time spent in a function EXcluding called functions (the bit in which we are most interested)
The figures on the top line level with <0,39,0> are the totals for the whole process.
The 'unit' column I added, and shows the 'own' time for a function divided by the number of times it is executed (i.e. the unit time for an execution ;-) ) and the % of total column shows the Own time as a percentage of the total own time.
The bad news, as can be seen from this %age column, is that the is no particular 'hotspot', so no obvious place to focus our efforts.
SO . . . I just started at the top of the list. The function p_all seemed the biggest user, but looking at the code, t is not easy to see how to make it any slicker. The same is true of p_attempt and p.
The first line of interest is that for get_memo. I have wondered for a while about whether it would be better to use a proplist to store the caching information, so that was my first experiment. My initial try showed an improvement of about 50% reduction in time, then I remembered this was elapsed time, and so highly sensitive to the resources being used on my machine at the time.
To combat this variable, the best I could think of was to scale everything relative to the p_all time (which I did not expect to be improving any). This is what you see in the far right column, where the own time for each function is shown as a fraction of the p_all time.
MemoisationHaving changed the memoisation routines to use a proplist, I got the following results:
"<0.39.0>" |
|
|
1940528 |
undefined |
26105.719 |
|
50.17% |
|
11.9094 |
|
|
|
Count |
Tot |
Own |
Unit |
%ageTotal |
|
|
neotoma_parse |
p_all |
4 |
138932 |
49375.193 |
2192.023 |
0.01578 |
8.40% |
|
1.0000 |
neotoma_parse |
p |
5 |
51709 |
49669.187 |
1948.556 |
0.03768 |
7.46% |
|
0.8889 |
neotoma_parse |
p_attempt |
4 |
130960 |
47517.007 |
1896.901 |
0.01448 |
7.27% |
|
0.8654 |
neotoma_parse |
get_memo |
1 |
51709 |
4467.727 |
1271.97 |
0.02460 |
4.87% |
|
0.5803 |
neotoma_parse |
memoize |
2 |
42376 |
3598.339 |
1195.318 |
0.02821 |
4.58% |
|
0.5453 |
neotoma_parse |
p_advance_index |
2 |
95022 |
7174.674 |
1161.685 |
0.01223 |
4.45% |
|
0.5300 |
lists |
foldl |
3 |
95022 |
3465.169 |
1090.941 |
0.01148 |
4.18% |
|
0.4977 |
neotoma_parse |
-p_string/1-fun-0-' |
4 |
110587 |
2499.068 |
1051.826 |
0.00951 |
4.03% |
|
0.4798 |
neotoma_parse |
p_scan |
4 |
53795 |
49261.369 |
957.463 |
0.01780 |
3.67% |
|
0.4368 |
proplists |
lookup |
2 |
137078 |
2055.560 |
953.497 |
0.00696 |
3.65% |
|
0.4350 |
re |
run |
3 |
14558 |
1030.919 |
844.034 |
0.05798 |
3.23% |
|
0.3850 |
unicode |
characters_to_list |
1 |
47433 |
2274.426 |
708.93 |
0.01495 |
2.72% |
|
0.3234 |
neotoma_parse |
memo_table_name |
0 |
95476 |
1422.423 |
676.268 |
0.00708 |
2.59% |
|
0.3085 |
garbage_collect |
537 |
|
1 |
674.150 |
669.530 |
1.00000 |
2.56% |
|
0.3054 |
neotoma_parse |
-p_not/1-fun-0-' |
3 |
37517 |
2014.170 |
630.088 |
0.01679 |
2.41% |
|
0.2874 |
ets |
lookup |
2 |
52794 |
1037.220 |
611.673 |
0.01159 |
2.34% |
|
0.2790 |
proplists |
is_defined |
2 |
74994 |
1209.329 |
579.755 |
0.00773 |
2.22% |
|
0.2645 |
ets |
insert |
2 |
42676 |
936.285 |
577.493 |
0.01353 |
2.21% |
|
0.2635 |
neotoma_parse |
-p_charclass/1-fun-0-' |
4 |
14558 |
3818.591 |
564.022 |
0.03874 |
2.16% |
|
0.2573 |
neotoma_parse |
-p_anything/0-fun-0-' |
2 |
34785 |
6316.239 |
489.329 |
0.01407 |
1.87% |
|
0.2232 |
As can be seen, there is some improvement in get_memo (down from 0.6456 to 0.5803), sadly there is a corresponding INcrease in memoize (up from 0.4968 to 0.5453) which reduces the gain to just 0.05 p_all execution-equivalents (if you will excuse my somewhat clumsy scale). Actually, the impacts are complex, because all sorts of other things have shifted, so I concentrated on the figure at the top which shows the total elapsed time in terms of p_all executions. This has reduced from 13.225 to 11.909, a reduction of about 10%. Not stunning, but a worthwhile improvement.
The next obvious target was advance_index. It seemed to me that the index structure was somewhat over-engineered, so I tried replacing it with a simple dotted pair ([x|y]). HAS to produce some benefits, right?
Well the results were as follows:
"<0.39.0>" |
|
|
1940530 |
undefined |
26095.127 |
|
50.15% |
|
13.4288 |
|
|
|
Count |
Tot |
Own |
Unit |
%ageTotal |
|
|
neotoma_parse |
p_all |
4 |
138932 |
51173.965 |
1943.224 |
0.01399 |
7.45% |
|
1.0000 |
neotoma_parse |
p |
5 |
51709 |
51657.472 |
1923.512 |
0.03720 |
7.37% |
|
0.9899 |
neotoma_parse |
p_attempt |
4 |
130960 |
49330.829 |
1855.634 |
0.01417 |
7.11% |
|
0.9549 |
neotoma_parse |
get_memo |
1 |
51709 |
4568.310 |
1265.543 |
0.02447 |
4.85% |
|
0.6513 |
neotoma_parse |
p_advance_index |
2 |
95022 |
7860.237 |
1165.753 |
0.01227 |
4.47% |
|
0.5999 |
proplists |
lookup |
2 |
137078 |
2279.512 |
1160.885 |
0.00847 |
4.45% |
|
0.5974 |
lists |
foldl |
3 |
95022 |
3769.014 |
1107.37 |
0.01165 |
4.24% |
|
0.5699 |
neotoma_parse |
-p_string/1-fun-0-' |
4 |
110587 |
2909.453 |
1047.395 |
0.00947 |
4.01% |
|
0.5390 |
neotoma_parse |
memoize |
2 |
42376 |
3365.578 |
962.101 |
0.02270 |
3.69% |
|
0.4951 |
neotoma_parse |
p_scan |
4 |
53795 |
51087.806 |
910.642 |
0.01693 |
3.49% |
|
0.4686 |
re |
run |
3 |
14558 |
1021.338 |
885.071 |
0.06080 |
3.39% |
|
0.4555 |
neotoma_parse |
-p_charclass/1-fun-0-' |
4 |
14558 |
4297.158 |
766.53 |
0.05265 |
2.94% |
|
0.3945 |
garbage_collect |
537 |
|
1 |
738.733 |
732.901 |
1.00000 |
2.81% |
|
0.3772 |
unicode |
characters_to_list |
1 |
47433 |
2487.362 |
723.41 |
0.01525 |
2.77% |
|
0.3723 |
neotoma_parse |
memo_table_name |
0 |
95476 |
1529.930 |
663.84 |
0.00695 |
2.54% |
|
0.3416 |
ets |
lookup |
2 |
52794 |
1100.060 |
629.197 |
0.01192 |
2.41% |
|
0.3238 |
proplists |
is_defined |
2 |
74994 |
1355.632 |
606.665 |
0.00809 |
2.32% |
|
0.3122 |
ets |
insert |
2 |
42676 |
876.211 |
588.44 |
0.01379 |
2.25% |
|
0.3028 |
neotoma_parse |
-p_not/1-fun-0-' |
3 |
37517 |
2047.480 |
547.493 |
0.01459 |
2.10% |
|
0.2817 |
neotoma_parse |
-p_anything/0-fun-0-' |
2 |
34785 |
6542.401 |
545.422 |
0.01568 |
2.09% |
|
0.2807 |
So the total has gone from 13.2247 to 13.4288. An INCREASE of about 1.5% ! ! !
Hmmm - Well maybe dotted pair handling is not so great after all. I will try a 2-item list next.
But you can see that this does not look like it is going to be easy. One way to improve things is to tighten up the structure of the PEG. I am sure I could simplify it, especially around the definition of the lowest level elements, and thus reduce the number of times all these key functions are executed (p_all, p, etyc.) but that seems a bit like putting the cart before the horse. I do not really want to bend my PEG (if you will excuse the expression) for good performance, but of course if that is what it takes . . .
All ideas gratefully received.
g
_____________________________________________
On 15 March 2011 03:56, Tony Arcieri
<tony.a...@medioh.com> wrote:
Any plans for further performance improvements in the roadmap? :)
On Sun, Mar 13, 2011 at 4:25 PM, Sean Cribbs
<seanc...@gmail.com> wrote:
It's been a year coming, so I'm proud to announce a new version of
Neotoma, the PEG/Packrat toolkit and parser-generator. The primary
feature of version 1.5 is that it uses significantly less memory and
also supports UTF8 (via binaries), thanks to some awesome work by
Graeme Defty, who has also converted Reia to use a PEG.
You can get the latest source on Github: https://github.com/seancribbs/neotoma
Neotoma is rebar-friendly as well, so adding it to a rebar project is
a simple deps declaration like so:
{neotoma, "1.5", {git, "git://github.com/seancribbs/neotoma", {tag, "1.5"}}}
Hope to see some of you at Erlang Factory SF next week.
Cheers,
Sean Cribbs
--
Tony Arcieri
Medioh! Kudelski