EasyRdf Turtle parser is very slow

108 views
Skip to first unread message

osma.s...@helsinki.fi

unread,
Nov 2, 2015, 3:15:11 AM11/2/15
to EasyRdf Discussion
Hi!

In our application (Skosmos), some of the configuration is stored as a Turtle file, which is parsed by EasyRdf. We noticed that as the configuration file has grown over time, parsing it has become quite slow.

Here is a test script:

require_once 'vendor/autoload.php';
$graph = new EasyRdf_Graph();
$graph->parseFile('computer.ttl');

The example data file (computer.ttl) is attached. It has 1566 lines, 1866 triples, and is about 45 kB in size. In practice, this file is a snippet of the hierarchy of YSO, containing mostly information about old computer programs expressed as SKOS. This is not the configuration file we use, but since it's larger it illustrates the same problem better than the configuration file so I use it.

Parsing this file with the above code takes about 6.1 seconds on my i5-3470 system.

I tried converting the file to RDF/XML and N-triples and parsing those instead. Total execution time (which I'm sure includes also some initializing, not just RDF parsing) was 0.24 and 0.18 seconds, respectively. So the Turtle parser is about 30 times slower than the other parsers. Of course the other formats may be simpler to parse, but I think this is still quite a dramatic difference.

I tried profiling the Turtle parsing script using XDebug. I'm not very experienced in PHP profiling, but it seemed to me that a large part (over 50%?) of execution time is spent within the PHP internal function mb_substr. Maybe that is the underlying reason?

We use EasyRdf 0.9.1 and the tests above were run using PHP 5.3.10 (Ubuntu 12.04 LTS package). I also tested on an Ubuntu 14.04 LTS laptop with PHP 5.5.9 and got roughly the same results (almost 9 seconds, on a slower CPU).

Is this something that could/should be considered a bug in EasyRdf, or is this just normal behavior? In our application, we already use APC to cache the result of parsing the Turtle file, so in practice, this is not much of a performance hit.

Thanks,
Osma Suominen
computer.ttl

Pieter Colpaert

unread,
Apr 13, 2017, 6:29:08 PM4/13/17
to EasyRdf Discussion
Hi Osma,

Probably bit late late with an answer to this, yet I’m working on a streaming TRIG/Turtle parser for PHP. I tested your file on my system with easyRDF and my library hardf and seems to give better results:

Perftest EasyRDF vs. Hardf:
#HARDF
pieter@pieter-e7:~/Projects/hardf$ php perf/parser-streaming-perf.php ~/Documents/computer.ttl 
- Parsing file /home/pieter/Documents/computer.ttl: 0.027586936950684s
* Triples parsed: 1866
* Memory usage: 0.72248077392578MB

#EASYRDF
pieter@pieter-e7:~/Projects/easyrdf-perftest$ php perftest.php ~/Documents/computer.ttl 
- Parsing file /home/pieter/Documents/computer.ttl: 5.1665079593658s
* Triples parsed: 1866
* Memory usage: 2.7724761962891MB

Feel free to give it a test-spin, currently on development branch: https://github.com/pietercolpaert/hardf/tree/development

Kind regards,

Pieter

Op maandag 2 november 2015 09:15:11 UTC+1 schreef osma.s...@helsinki.fi:

Osma Suominen

unread,
Apr 17, 2017, 2:24:48 AM4/17/17
to eas...@googlegroups.com
Hi Pieter,

Thanks, this is very interesting! I'll give it a spin.

Recently my colleague Henri did a modification, in the form of a
subclass that overrides some key methods, to the EasyRdf Turtle parser
that avoids the worst problems with the UTF-8 handling and is thus much
faster. You may want to try it too and compare it with your
implementation. The code is here:
https://github.com/NatLibFi/Skosmos/blob/master/model/NamespaceExposingTurtleParser.php

(the class is a bit misnamed, it used to add just a single method for
exposing namespaces but now it is optimized too so should be renamed)

See also the related GitHub issue:
https://github.com/NatLibFi/Skosmos/issues/573

-Osma
> --
> You received this message because you are subscribed to the Google
> Groups "EasyRdf Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to easyrdf+u...@googlegroups.com
> <mailto:easyrdf+u...@googlegroups.com>.
> To post to this group, send email to eas...@googlegroups.com
> <mailto:eas...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/easyrdf.
> For more options, visit https://groups.google.com/d/optout.


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi
Reply all
Reply to author
Forward
0 new messages