Dear rdflib contributors and maintainers,
I have recently been trying to update rdflib to version 6 from 4.2.2. Upon doing so, a process I normally run, which uses rdflib to load a large xml RDF file into a graph, has a significantly larger memory profile and latency (for my large file, parsing is taking about 1.5x as much time).
I've traced the issue back to the graph.parse method. More specifically, by profiling the graph.parse with versions 6.1.1 and 4.2.2, I can see that calls to access members of the RDF class (mostly occurring in the
node_element_start method here as well as the property_element_start method) seem to be taking up a significantly longer time, as they the class is now a DefinedNamespace with overridden
__getitem__ and __contains methods with added string checks.
Has anyone else experienced this issue? I have been trying to find ways to work around/with the library to lower the latency, but haven't been able to find anything yet.
Thanks,
Brendan