I went ahead and did the thing Mario and I discussed on the list back
in April; we now have a working example of a proof verifier that works
at a segment level, incrementally and taking advantage of parallelism.
The code and build/execution instructions are
https://github.com/sorear/smetamath-rs.
The more advanced features from SMM2, the grammatical parser and the
outline extraction, are not yet in place. Since the point of
parallelism is to be as fast as possible, the stats:
On my machine it's 6 times faster than the reference impl. All tests
using
set.mm develop/0c32b31f. Unfortunately there's no overlap with
Dan Getz' list and I don't know his benchmark machine specs, so can't
directly compare the two.
uname -a: Linux sorear6 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4
23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
/sys/devices/virtual/dmi/id/board_version: MacBookAir7,2
/proc/cpuinfo: model name : Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
(2 cores, 2-way SMT ("hyperthreading"); I'd like to include memory
latency, but I can't figure out how to get those specs)
SMM3 (rustc 1.11.0-nightly (0554abac6 2016-06-10))
Performance counter stats for 'target/release/smetamath --verify
--split --jobs 4 /home/sorear/dlcode/
set.mm/set.mm' (30 runs):
1955.445112 task-clock (msec) # 2.679 CPUs
utilized ( +- 0.15% )
366 context-switches # 0.187 K/sec
( +- 6.28% )
59 cpu-migrations # 0.030 K/sec
( +- 2.75% )
64,295 page-faults # 0.033 M/sec
( +- 0.09% )
4,915,730,304 cycles # 2.514 GHz
( +- 0.16% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
5,811,148,177 instructions # 1.18 insns per
cycle ( +- 0.00% )
1,179,294,370 branches # 603.082 M/sec
( +- 0.00% )
36,216,379 branch-misses # 3.07% of all
branches ( +- 0.04% )
0.729840747 seconds time elapsed
( +- 0.70% )
or, limited to a single thread:
Performance counter stats for 'target/release/smetamath --verify
--jobs 1 /home/sorear/dlcode/
set.mm/set.mm' (30 runs):
1238.270068 task-clock (msec) # 0.999 CPUs
utilized ( +- 0.23% )
16 context-switches # 0.013 K/sec
( +- 22.09% )
0 cpu-migrations # 0.000 K/sec
( +- 36.27% )
63,299 page-faults # 0.051 M/sec
( +- 0.00% )
3,288,685,107 cycles # 2.656 GHz
( +- 0.17% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
5,819,668,862 instructions # 1.77 insns per
cycle ( +- 0.00% )
1,183,317,488 branches # 955.621 M/sec
( +- 0.00% )
34,112,392 branch-misses # 2.88% of all
branches ( +- 0.24% )
1.239175774 seconds time elapsed
( +- 0.23% )
metamath.exe (gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)):
Performance counter stats for '/home/sorear/dlcode/metamath/metamath
r '/home/sorear/dlcode/
set.mm/set.mm v p * exit' (10 runs):
4811.725531 task-clock (msec) # 1.000 CPUs
utilized ( +- 0.42% )
31 context-switches # 0.006 K/sec
( +- 24.00% )
1 cpu-migrations # 0.000 K/sec
( +- 31.18% )
41,349 page-faults # 0.009 M/sec
( +- 0.00% )
12,782,471,756 cycles # 2.657 GHz
( +- 0.41% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
24,731,912,028 instructions # 1.93 insns per
cycle ( +- 0.01% )
4,994,967,694 branches # 1038.082 M/sec
( +- 0.01% )
128,510,117 branch-misses # 2.57% of all
branches ( +- 0.20% )
4.813443077 seconds time elapsed
( +- 0.42% )
SMM2 (node.js 5.11.0; 6.x is measurably slower :( )
Performance counter stats for '/home/sorear/dlcode/metamath/metamath
r '/home/sorear/dlcode/
set.mm/set.mm v p * exit' (10 runs):
Performance counter stats for 'node
/home/sorear/active/smm/out/misc/verify.js
/home/sorear/dlcode/
set.mm/set.mm' (10 runs):
7261.357665 task-clock (msec) # 1.043 CPUs
utilized ( +- 0.34% )
337 context-switches # 0.046 K/sec
( +- 3.58% )
38 cpu-migrations # 0.005 K/sec
( +- 5.13% )
114,217 page-faults # 0.016 M/sec
( +- 0.11% )
19,221,842,218 cycles # 2.647 GHz
( +- 0.33% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
35,100,750,527 instructions # 1.83 insns per
cycle ( +- 0.04% )
7,773,372,796 branches # 1070.512 M/sec
( +- 0.04% )
96,023,227 branch-misses # 1.24% of all
branches ( +- 0.13% )
6.960570593 seconds time elapsed
( +- 0.29% )
The incremental mode is much harder to show with benchmarks, but a
minimal recalculation with no cascading effects can be under 100ms.
Much of this is in the read() system call; if you split
set.mm into
one file per section, it can notice unmodified sections with stat()
bringing the baseline down to 18ms.
-sorear