You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EasyRdf Discussion
Hi!
In our application (Skosmos), some of the configuration is stored as a Turtle file, which is parsed by EasyRdf. We noticed that as the configuration file has grown over time, parsing it has become quite slow.
Here is a test script:
require_once 'vendor/autoload.php'; $graph = new EasyRdf_Graph(); $graph->parseFile('computer.ttl');
The example data file (computer.ttl) is attached. It has 1566 lines, 1866 triples, and is about 45 kB in size. In practice, this file is a snippet of the hierarchy of YSO, containing mostly information about old computer programs expressed as SKOS. This is not the configuration file we use, but since it's larger it illustrates the same problem better than the configuration file so I use it.
Parsing this file with the above code takes about 6.1 seconds on my i5-3470 system.
I tried converting the file to RDF/XML and N-triples and parsing those instead. Total execution time (which I'm sure includes also some initializing, not just RDF parsing) was 0.24 and 0.18 seconds, respectively. So the Turtle parser is about 30 times slower than the other parsers. Of course the other formats may be simpler to parse, but I think this is still quite a dramatic difference.
I tried profiling the Turtle parsing script using XDebug. I'm not very experienced in PHP profiling, but it seemed to me that a large part (over 50%?) of execution time is spent within the PHP internal function mb_substr. Maybe that is the underlying reason?
We use EasyRdf 0.9.1 and the tests above were run using PHP 5.3.10 (Ubuntu 12.04 LTS package). I also tested on an Ubuntu 14.04 LTS laptop with PHP 5.5.9 and got roughly the same results (almost 9 seconds, on a slower CPU).
Is this something that could/should be considered a bug in EasyRdf, or is this just normal behavior? In our application, we already use APC to cache the result of parsing the Turtle file, so in practice, this is not much of a performance hit.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to EasyRdf Discussion
Hi Osma,
Probably bit late late with an answer to this, yet I’m working on a streaming TRIG/Turtle parser for PHP. I tested your file on my system with easyRDF and my library hardf and seems to give better results:
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to eas...@googlegroups.com
Hi Pieter,
Thanks, this is very interesting! I'll give it a spin.
Recently my colleague Henri did a modification, in the form of a
subclass that overrides some key methods, to the EasyRdf Turtle parser
that avoids the worst problems with the UTF-8 handling and is thus much
faster. You may want to try it too and compare it with your
implementation. The code is here:
https://github.com/NatLibFi/Skosmos/blob/master/model/NamespaceExposingTurtleParser.php
(the class is a bit misnamed, it used to add just a single method for
exposing namespaces but now it is optimized too so should be renamed)