404 errors occur intermittantly for a series of sparql endpoint POST requests

25 views
Skip to first unread message

Matt Karikomi

unread,
Dec 4, 2020, 7:13:46 PM12/4/20
to pathway-commons-help
Hi,
In order to limit the complexity of individual queries to prevent timeouts, I often iteratively expand graph patterns, resulting in a long sequence of very fast queries, which are be generated programmatically.

While these are running, I sometimes get 404 errors.

Please note, this is a sequence of requests, not simultaneous requests.

Is there some quota imposed on client IP's and if so, how should I schedule requests accordingly?

Thanks, Matt

Gary Bader

unread,
Dec 7, 2020, 12:10:29 AM12/7/20
to pathway-commons-help

Hi Matt - we haven't specified a quota that I know of and those 404 errors shouldn't be happening. I don't recall hearing about anyone else facing that, so it may just have to do with server capacity. We'll look into increasing the resources of that server to see. Otherwise, we could send you the machine as a virtual machine image for you to run yourself if you'd like.

Best,
Gary

--
You received this message because you are subscribed to the Google Groups "pathway-commons-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pathway-commons-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pathway-commons-help/73eb1288-5de0-4c2e-9f0c-4be48903e199n%40googlegroups.com.

Matt Karikomi

unread,
Dec 7, 2020, 2:38:09 PM12/7/20
to pathway-co...@googlegroups.com
Hi Gary,
Thanks for checking on this for me!

Along the lines of machine image, I've actually had success just using the published v12 rdf-xml dumps (just noticed the server doesn't respond at https://www.pathwaycommons.org/archives/ this morning).

One note on the dumps, I've noticed that records in the dump use the rdf:ID tag, for example one record from "PathwayCommons12.All.BIOPAX.owl.gz" gives:
<bp:ExperimentalForm rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0">
which pulls in the the base URI from the `xml:base` attribute in the enclosing `rdf:RDF` tag, in this case it is:
xml:base="http://pathwaycommons.org/pc12/"
According to the W3C xml-rdf specification, this should generate a URI for that particular reagent with a prepended "#", like:

http://pathwaycommons.org/pc12/#ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e
However, I've noticed that the hosted version of these URI's seems to exclude the prepending "#", like this:

http://pathwaycommons.org/pc12/ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e


Just wondering if this discrepancy is intended?

Originally I thought this was a bug in rdf4j, credit to @Jeen_Boekstra (one of the rdf4j devs) for pointing out the relevant documentation.


Best, Matt

You received this message because you are subscribed to a topic in the Google Groups "pathway-commons-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pathway-commons-help/deK9j6AI1Ag/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pathway-commons-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pathway-commons-help/2A2068CC-86EF-49FF-9C33-1ECED22B9F44%40utoronto.ca.

Igor R

unread,
Dec 7, 2020, 3:25:27 PM12/7/20
to pathway-co...@googlegroups.com
I think the thing about ‘#’ in the full BioPAX URI (is also URL) is an old bug/feature of our java library, where the full uri of an object is xml:base plus its ID (without extra #). The intention has always been not to have any ‘#’ in the URIs (# limits how you can resolve the URI/URL to some info page, which is good to have for LinkedData compliance.)

PS: not many people use the PC BioPAX data as RDF, with SPARQL,  (I personally think that, in practice, it’s really difficult to do any nontrivial analysis, inference of the BioPAX (OWL based) model with SPARQL without spending serious time on each query - the model is not simple, and the data isn’t perfect either, despite much effort). It’s somewhat easier to analyze with Java or Python, loading the BioPAX file and traversing it, knowing the BioPAX.

PPS: our services are currently experiencing emergency maintenance, some may be temporary unavailable.

Igor R.

On Dec 7, 2020, at 2:38 PM, Matt Karikomi <mattka...@gmail.com> wrote:



Matt Karikomi

unread,
Dec 7, 2020, 4:47:02 PM12/7/20
to pathway-co...@googlegroups.com
Thanks for looking into this, Igor!
Yeah the issues you mentioned with rdf are definitely there.  In the long run I still think your RDF endpoint is a unique and valuable resource because it's the only RDF endpoint that provides even limited integration of pathways from multiple knowledge sources via a unified set of physical entity URIs.

A little bit of background on our use case:
We are at alpha stage with a packaged client written in pure julia (which we prefer for mechanistic modeling) that automatically acquires network structures from PC via rdf and adds annotations from uniprot,nextprot, and ontoDB (all SIB resources are rdf-based).  An internal API makes endpoints completely modular so this list can grow.  Limited support also exists for pathway composition using your curated entity URIs, such that when you are certain enough that two pathways share a physical entity, we can create a unified pathway out of them, then examine all entities, or a series of entities and the reactions between them, denoted by some traversal.  The underlying data structure is graph based, so the entity and interaction lists that we use in our models  in turn can be filtered based on orthology info from OrthoDB, acquired uniprot annotations, when we outline the modeled pathway via traversals within the package. 

I think it's also worth noting that while java is powerful, performant, and widely understood environment within the software and cs community (and a language I grew to love in undergrad!), it is virtually impossible to expect an end user to hack on a java pipeline when their background is applied math (vs cs or engineering), which is the majority of our target audience.  Additionally, in cases where the user is running a full pipeline on hpc, hybrid language environments with dependencies like java can be very difficult for a user, who is again, not necessarily a software engineer to debug.

Best,
Matt


Reply all
Reply to author
Forward
0 new messages