more fine grained timings

10 views
Skip to first unread message

George Lilly

unread,
Nov 3, 2011, 6:12:23 PM11/3/11
to fileman-tr...@googlegroups.com
at Rob's suggestion, I now log timings for each stage: downloading,
parsing, and insertion..
perhaps we will sometime try separating insertion from indexing, but
can't do it yet.

For a small file, it's all in insertion:


GTM>D WGET^C0XMAIN("http://glilly.net/arc/collins-frank.rdf")

STARTED: 3111103.150902
DOWNLOADING: http://glilly.net/arc/collins-frank.rdf
201 LINES READ
DOWNLOAD COMPLETE AT 3111103.150902
ELAPSED TIME: 0 SECONDS
ADDED: _:G640622416 _S:619512651 fmts:rdfSource _TXT_INCOMING_RDF_FILE_http://gl
illy.net/arc/collins-frank.rdf_921816732
128 XML NODES PARSED
PARSE COMPLETE AT 3111103.150902
ELAPSED TIME: 0 SECONDS
INSERTING 99 TRIPLES
INSERTION COMPLETE AT 3111103.150902
ELAPSED TIME: 3 SECONDS
APPROXIMATELY 33 NODES PER SECOND
ENDED AT: 3111103.150905
ELAPSED TIME: 3 SECONDS
APPROXIMATELY 33 TRIPLES PER SECOND


for a larger file (qds).. here's what we get. it shows that even with
large files, almost the entire process is insertion.

GTM>D WGET^C0XMAIN("https://trac.opensourcevista.net/svn/qrda/qds/qds.rdf")

STARTED: 3111103.151357
DOWNLOADING: https://trac.opensourcevista.net/svn/qrda/qds/qds.rdf
73944 LINES READ
DOWNLOAD COMPLETE AT 3111103.15142
ELAPSED TIME: 23 SECONDS
APPROXIMATELY 3214 LINES PER SEC
ADDED: _:G920531835 _S:792435840 fmts:rdfSource _TXT_INCOMING_RDF_FILE_https://t
rac.opensourcevista.net/svn/qrda/qds/qds.rdf_357543112
71530 XML NODES PARSED
PARSE COMPLETE AT 3111103.151503
ELAPSED TIME: 35 SECONDS
APPROXIMATELY 2043 NODES PER SECOND
INSERTING 69537 TRIPLES
INSERTION COMPLETE AT 3111103.151503
ELAPSED TIME: 1755 SECONDS
APPROXIMATELY 39 NODES PER SECOND
ENDED AT: 3111103.154418
ELAPSED TIME: 1821 SECONDS
APPROXIMATELY 38 TRIPLES PER SECOND

rtweed

unread,
Nov 3, 2011, 6:56:04 PM11/3/11
to Fileman Triple Store
On my pretty basic Linux machine with a completely unoptimised install
of GT.M I can push about 140,000 global sets/sec/. Assuming an
insertion rate of 35 triples/sec could be achieved on my hardware,
that would suggest that a single triple required in the order of 4000
global accesses. That seems like an awful lot of activity just to
insert a single triple. Any idea why it would appear to be so
resource-intensive?

Rob

glilly

unread,
Nov 4, 2011, 9:49:59 AM11/4/11
to Fileman Triple Store
It looks like more memory helps. Nancy set up an instance for me on
raven that has a lot of memory (I thought she said 256 meg) and I'm
getting triples per second that I have not seen before:

GTM>D IMPORT^C0XMAIN("qds/QDS_0002.rdf")

STARTED: 3111104.102614
READING IN: qds/QDS_0002.rdf
2899 LINES READ
ADDED: _:G623094120 _S:965696293 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/ge
o
rge/fmts/trunk/samples/qds/QDS_0002_rdf_941379764
2821 XML NODES PARSED
INSERTING 2748 TRIPLES
ENDED AT: 3111104.102647
ELAPSED TIME: 33 SECONDS
APPROXIMATELY 83 TRIPLES PER SECOND

Nancy Anthracite

unread,
Nov 4, 2011, 10:13:33 AM11/4/11
to fileman-tr...@googlegroups.com, glilly
Raven is not a VM and it has 6 gigs.


--
Nancy Anthracite

K.S. Bhaskar

unread,
Nov 4, 2011, 10:13:47 AM11/4/11
to fileman-tr...@googlegroups.com
If (a) your database is not journaled, (b) you are running or can run
V5.4-002B and can use nobefore image journaling, then please do try
the MM access method for better performance.

Regards
-- Bhaskar

--
Windows does to computers what smoking does to humans

Rob Tweed

unread,
Nov 4, 2011, 10:17:42 AM11/4/11
to fileman-tr...@googlegroups.com
It still suggests that an unusually large amount of global accesses are required to insert a triple - is it possible to  give a high-level view of what's involved? Are they really that complex a data structure? I didn't think they were fundamentally very complex but maybe they are

Rob
--
Rob Tweed
Director, M/Gateway Developments Ltd
http://www.mgateway.com
------------------
EWD Mobile: Build mobile applications faster
http://www.mgateway.com/ewd.html

glilly

unread,
Nov 4, 2011, 12:42:46 PM11/4/11
to Fileman Triple Store
Some more tests on raven... the rates are pretty consistent.
gpl

GTM>D IMPORT^C0XMAIN("smart-rdf-in/collins-frank.rdf")

STARTED: 3111104.130509
READING IN: smart-rdf-in/collins-frank.rdf
200 LINES READ
ADDED: _:G072744409 _S:795646155 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/geo
rge/fmts/trunk/samples/smart-rdf-in/collins-frank_rdf_585185215
128 XML NODES PARSED
INSERTING 99 TRIPLES
ENDED AT: 3111104.13051
ELAPSED TIME: 1 SECONDS
APPROXIMATELY 99 TRIPLES PER SECOND

GTM>D IMPORT^C0XMAIN("smart-rdf-in/reed-richard.rdf")

STARTED: 3111104.130606
READING IN: smart-rdf-in/reed-richard.rdf
722 LINES READ
ADDED: _:G758268243 _S:177410100 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/geo
rge/fmts/trunk/samples/smart-rdf-in/reed-richard_rdf_213828749
462 XML NODES PARSED
INSERTING 339 TRIPLES
ENDED AT: 3111104.13061
ELAPSED TIME: 4 SECONDS
APPROXIMATELY 84 TRIPLES PER SECOND

GTM>D IMPORT^C0XMAIN("smart-rdf-in/cole-susan.rdf")

STARTED: 3111104.130628
READING IN: smart-rdf-in/cole-susan.rdf
3428 LINES READ
ADDED: _:G271187746 _S:899679576 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/geo
rge/fmts/trunk/samples/smart-rdf-in/cole-susan_rdf_538236597
2101 XML NODES PARSED
SKIPPING NODE: 9
SKIPPING NODE: 26
SKIPPING NODE: 29
SKIPPING NODE: 32
SKIPPING NODE: 35
INSERTING 1425 TRIPLES
ENDED AT: 3111104.130645
ELAPSED TIME: 17 SECONDS
APPROXIMATELY 83 TRIPLES PER SECOND

GTM>D IMPORT^C0XMAIN("smart-rdf-in/ford-shirley.rdf")

STARTED: 3111104.130703
READING IN: smart-rdf-in/ford-shirley.rdf
8922 LINES READ
ADDED: _:G740421472 _S:922849860 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/geo
rge/fmts/trunk/samples/smart-rdf-in/ford-shirley_rdf_809878775
5470 XML NODES PARSED
ERROR, NO OBJECT FOUND FOR NODE: 4217
ERROR, NO OBJECT FOUND FOR NODE: 4226
ERROR, NO OBJECT FOUND FOR NODE: 4232
ERROR, NO OBJECT FOUND FOR NODE: 4258
ERROR, NO OBJECT FOUND FOR NODE: 4267
ERROR, NO OBJECT FOUND FOR NODE: 4273
INSERTING 3745 TRIPLES
ENDED AT: 3111104.130756
ELAPSED TIME: 53 SECONDS
APPROXIMATELY 70 TRIPLES PER SECOND

GTM>D IMPORT^C0XMAIN("smart-rdf-in/gracia-paul.rdf")

STARTED: 3111104.130817
READING IN: smart-rdf-in/gracia-paul.rdf
10698 LINES READ
ADDED: _:G289354757 _S:026395070 fmts:rdfSource
_TXT_INCOMING_RDF_FILE_/home/geo
rge/fmts/trunk/samples/smart-rdf-in/gracia-paul_rdf_297957633
6571 XML NODES PARSED
ERROR, NO OBJECT FOUND FOR NODE: 2751
ERROR, NO OBJECT FOUND FOR NODE: 2760
ERROR, NO OBJECT FOUND FOR NODE: 2766
ERROR, NO OBJECT FOUND FOR NODE: 6329
INSERTING 4512 TRIPLES
ENDED AT: 3111104.130928
ELAPSED TIME: 71 SECONDS
APPROXIMATELY 63 TRIPLES PER SECOND


On Nov 4, 10:13 am, Nancy Anthracite <nanthrac...@earthlink.net>
wrote:
> Raven is not a VM and it has 6 gigs.
>
>
>
>
>
>
><snip>
Reply all
Reply to author
Forward
0 new messages