I upgraded my MacBookPro (late 2009 => mid-2012 model); for reference,
here are the new results of performance tests on similarity querying.
Things are about 3x faster now, though there have been optimizations
to gensim in the meantime, so not all of this is due to the HW
upgrade.
./test/simspeed.py
wikismall.dense.mm wikismall.sparse.mm 5000
2012-07-22 13:13:01,988 : INFO : accepted corpus with 10000 documents,
200 features, 2000000 non-zero entries
2012-07-22 13:13:02,003 : INFO : accepted corpus with 10000 documents,
64538 features, 1305021 non-zero entries
2012-07-22 13:13:07,294 : INFO : scanning corpus to determine the
number of features
2012-07-22 13:13:07,416 : INFO : creating matrix for 5000 documents
and 200 features
2012-07-22 13:13:07,915 : INFO : creating sparse index
2012-07-22 13:13:07,915 : INFO : creating sparse matrix from corpus
2012-07-22 13:13:07,916 : INFO : PROGRESS: at document #0
2012-07-22 13:13:08,536 : INFO : created <5000x64538 sparse matrix of
type '<type 'numpy.float32'>'
with 645303 stored elements in Compressed Sparse Row format>
2012-07-22 13:13:08,536 : INFO : test 1 (dense): dense corpus of 1000
docs vs. index (5000 documents, 200 dense features)
2012-07-22 13:13:09,215 : INFO : chunksize=1, time=0.6780s (1474.88
docs/s, 1474.88 queries/s)
2012-07-22 13:13:09,848 : INFO : chunksize=4, time=0.6334s (1578.77
docs/s, 394.69 queries/s)
2012-07-22 13:13:10,270 : INFO : chunksize=8, time=0.4212s (2373.92
docs/s, 296.74 queries/s)
2012-07-22 13:13:10,575 : INFO : chunksize=16, time=0.3051s (3278.08
docs/s, 206.52 queries/s)
2012-07-22 13:13:10,842 : INFO : chunksize=64, time=0.2675s (3738.12
docs/s, 59.81 queries/s)
2012-07-22 13:13:11,105 : INFO : chunksize=128, time=0.2628s (3805.83
docs/s, 30.45 queries/s)
2012-07-22 13:13:11,363 : INFO : chunksize=256, time=0.2571s (3888.90
docs/s, 15.56 queries/s)
2012-07-22 13:13:11,618 : INFO : chunksize=512, time=0.2553s (3916.64
docs/s, 7.83 queries/s)
2012-07-22 13:13:11,914 : INFO : chunksize=1024, time=0.2959s (3379.22
docs/s, 3.38 queries/s)
2012-07-22 13:13:11,914 : INFO : test 2 (sparse): sparse corpus of
1000 docs vs. sparse index (5000 documents, 64538 features, 0.20%
density)
2012-07-22 13:13:15,548 : INFO : chunksize=1, time=3.6339s (275.19
docs/s, 275.19 queries/s)
2012-07-22 13:13:16,739 : INFO : chunksize=5, time=1.1901s (840.27
docs/s, 168.05 queries/s)
2012-07-22 13:13:17,614 : INFO : chunksize=10, time=0.8749s (1142.94
docs/s, 114.29 queries/s)
2012-07-22 13:13:18,059 : INFO : chunksize=100, time=0.4449s (2247.87
docs/s, 22.48 queries/s)
2012-07-22 13:13:18,389 : INFO : chunksize=500, time=0.3301s (3029.03
docs/s, 6.06 queries/s)
2012-07-22 13:13:18,713 : INFO : chunksize=1000, time=0.3234s (3092.29
docs/s, 3.09 queries/s)
2012-07-22 13:13:18,713 : INFO : test 3 (dense): similarity of all vs.
all (5000 documents, 200 dense features)
2012-07-22 13:13:23,781 : INFO : chunksize=0, time=2.4236s (2063.09
docs/s)
2012-07-22 13:13:28,992 : INFO : chunksize=1, time=2.3609s (2117.83
docs/s, 2117.83 queries/s), meandiff=0.000e+00
2012-07-22 13:13:33,080 : INFO : chunksize=4, time=1.3450s (3717.59
docs/s, 929.40 queries/s), meandiff=1.748e-08
2012-07-22 13:13:36,541 : INFO : chunksize=8, time=0.6613s (7561.30
docs/s, 945.16 queries/s), meandiff=1.883e-08
2012-07-22 13:13:39,778 : INFO : chunksize=16, time=0.4335s (11535.12
docs/s, 722.10 queries/s), meandiff=1.883e-08
2012-07-22 13:13:42,906 : INFO : chunksize=64, time=0.2920s (17124.88
docs/s, 270.57 queries/s), meandiff=1.883e-08
2012-07-22 13:13:46,129 : INFO : chunksize=128, time=0.2672s (18709.99
docs/s, 149.68 queries/s), meandiff=1.883e-08
2012-07-22 13:13:49,770 : INFO : chunksize=256, time=0.2499s (20009.04
docs/s, 80.04 queries/s), meandiff=1.883e-08
2012-07-22 13:13:53,839 : INFO : chunksize=512, time=0.2082s (24012.05
docs/s, 48.02 queries/s), meandiff=1.883e-08
2012-07-22 13:13:57,956 : INFO : chunksize=1024, time=0.2033s
(24596.24 docs/s, 24.60 queries/s), meandiff=1.883e-08
2012-07-22 13:13:57,971 : INFO : test 4 (dense): as above, but only
ask for the top-10 most similar for each document
2012-07-22 13:14:02,521 : INFO : chunksize=0, time=4.5502s (1098.84
docs/s, 1098.84 queries/s)
2012-07-22 13:14:07,008 : INFO : chunksize=1, time=4.4863s (1114.51
docs/s, 1114.51 queries/s)
2012-07-22 13:14:10,542 : INFO : chunksize=4, time=3.5342s (1414.76
docs/s, 353.69 queries/s)
2012-07-22 13:14:13,214 : INFO : chunksize=8, time=2.6719s (1871.32
docs/s, 233.92 queries/s)
2012-07-22 13:14:15,712 : INFO : chunksize=16, time=2.4977s (2001.85
docs/s, 125.32 queries/s)
2012-07-22 13:14:18,065 : INFO : chunksize=64, time=2.3533s (2124.66
docs/s, 33.57 queries/s)
2012-07-22 13:14:20,344 : INFO : chunksize=128, time=2.2783s (2194.59
docs/s, 17.56 queries/s)
2012-07-22 13:14:22,713 : INFO : chunksize=256, time=2.3688s (2110.82
docs/s, 8.44 queries/s)
2012-07-22 13:14:25,059 : INFO : chunksize=512, time=2.3463s (2131.03
docs/s, 4.26 queries/s)
2012-07-22 13:14:27,508 : INFO : chunksize=1024, time=2.4487s (2041.90
docs/s, 2.04 queries/s)
2012-07-22 13:14:27,508 : INFO : test 5 (sparse): similarity of all
vs. all (5000 documents, 64538 features, 0.20% density)
2012-07-22 13:15:12,522 : INFO : chunksize=0, time=42.3504s (118.06
docs/s)
2012-07-22 13:15:24,321 : INFO : chunksize=5, time=8.9180s (560.66
docs/s, 112.13 queries/s), meandiff=0.000e+00
2012-07-22 13:15:32,827 : INFO : chunksize=10, time=5.6751s (881.04
docs/s, 88.10 queries/s), meandiff=0.000e+00
2012-07-22 13:15:37,634 : INFO : chunksize=100, time=1.8678s (2676.95
docs/s, 26.77 queries/s), meandiff=0.000e+00
2012-07-22 13:15:41,709 : INFO : chunksize=500, time=1.0960s (4562.06
docs/s, 9.12 queries/s), meandiff=0.000e+00
2012-07-22 13:15:45,663 : INFO : chunksize=1000, time=0.9768s (5118.52
docs/s, 5.12 queries/s), meandiff=0.000e+00
2012-07-22 13:15:49,489 : INFO : chunksize=5000, time=0.9335s (5356.46
docs/s, 1.07 queries/s), meandiff=0.000e+00
2012-07-22 13:15:49,504 : INFO : test 6 (sparse): as above, but only
ask for the top-10 most similar for each document
2012-07-22 13:16:33,699 : INFO : chunksize=0, time=44.1950s (113.13
docs/s, 113.13 queries/s)
2012-07-22 13:16:44,353 : INFO : chunksize=5, time=10.6540s (469.31
docs/s, 93.86 queries/s)
2012-07-22 13:16:51,806 : INFO : chunksize=10, time=7.4524s (670.92
docs/s, 67.09 queries/s)
2012-07-22 13:16:55,400 : INFO : chunksize=100, time=3.5940s (1391.19
docs/s, 13.91 queries/s)
2012-07-22 13:16:58,343 : INFO : chunksize=500, time=2.9426s (1699.19
docs/s, 3.40 queries/s)
2012-07-22 13:17:01,246 : INFO : chunksize=1000, time=2.9036s (1721.99
docs/s, 1.72 queries/s)
2012-07-22 13:17:04,238 : INFO : chunksize=5000, time=2.9911s (1671.65
docs/s, 0.33 queries/s)
2012-07-22 13:17:04,238 : INFO : finished running simspeed.py
-rr