New issue 413 by Stev...@gmail.com: Categories with one project return null
http://code.google.com/p/simal/issues/detail?id=413
For each of the 4 categories with only one project, out of 15 or so, I'm
getting an exception ("Cannot populate page with null category.") when
viewing the category detail page. Stepping through with the debugger
reveals that, indeed, no results are return by the Sparql query under
JenaCategoryService.findById(id).
The generated query looks fine:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX simal:
<http://oss-watch.ac.uk/ns/0.2/simal#>SELECT DISTINCT ?category WHERE
{ ?category simal:categoryId "per1451"}
I tried checking the query with the query tool on the Tools page, but no
joy: even an unrestricted query ( PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX simal:
<http://oss-watch.ac.uk/ns/0.2/simal#>SELECT DISTINCT ?category WHERE {})
always returns nothing.
Comment #1 on issue 413 by ross.gardler: Categories with one project return
null
http://code.google.com/p/simal/issues/detail?id=413
Can create a test that will reproduce this for us? We can then seek to fix
it.
If you are not sure how to create a test for this we'd be happy to help on
the list.
Thanks - the prod to reproduce has demonstrated that the problem was in my
database. So something in the importing of lots of RDF/XML has (once again)
caused the problem to occur. Importing with a slightly cleaner subset of
those files worked ok. So I'm guessing it was something to do with a file
that triggered an exception, but still got saved (but failed to update the
categories list properly).
I can send you a zip of the corrupted database, but given the above, it's
probably not interesting. (I don't want to post it here as it contains
semi-confidential data).
Comment #3 on issue 413 by ross.gardler: Categories with one project return
null
http://code.google.com/p/simal/issues/detail?id=413
I feel your problems with invalid RDF are caused by two factors:
a) your new to the project and thus uncovering issues by adopting a
differnt path to us
and, more importantly because it's avoidable
b) you DOAP customisations are allowing bad data (from the Simal
perspective) to be generated
It's hard for us to prioritise genuine issues from cause a) when most of
the problems appear to be from cause b).
I don't thinl there is much need to share your DB with us for debugging. We
currently have > 1600 projects and > 1200 people without any of these kinds
of problems.
If you can narrow down the problems that fit into category a) we'd be glad
to either fix them or help you fix them (and by saying that I mean "thank
you for helping us uncover these issues, you've identified a few already -
please lets have more. No problem with the odd erroneous report like this
one.").
> and, more importantly because it's avoidable
> b) you DOAP customisations are allowing bad data (from the Simal
> perspective) to be generated
It's not so avoidable because the data quality rules are not well
documented. Various bits of the code make assumptions about what is in a
project record. If the assumption is incorrect, an exception is raised, and
the user sees an error. But those assumptions (from memory, a project must
have a description, for instance) aren't documented, nor is there any
validation at the time records are ingested. I guess this is an aspect of
semantic web that I will have to get used to: input data is simply saved
directly to the database with no checking. In a traditional SQL database,
you would have constraints, foreign keys and the like, meaning you can
guarantee that at the time you retrieve data that it will be in a
consistent, good state.
Bottom line: it's actually surprisingly hard to avoid creating "bad data".
I wouldn't describe what I'm doing as "DOAP customisations" - most of what
I'm doing is simply generating data to import. Pretty standard use case,
really.
> It's hard for us to prioritise genuine issues from cause a) when most of
> the
> problems appear to be from cause b).
> I don't thinl there is much need to share your DB with us for debugging.
> We
> currently have > 1600 projects and > 1200 people without any of these
> kinds
> of problems.
Indeed. Well, I just hit this problem again, and fortunately I was able to
resolve it. In this case, I was running identical code *and data* on
development machine (XP) and production (Linux). The bug only showed up on
the production machine. I made a slight tweak to one of the 30 or so
description files I'm importing, cleaned the database, reimported...problem
went away. Who knows.
I think I have discovered the following amusing workaround:
1) Start from an empty/non-existent database.
2) Import all the projects.
3) Import all the projects.
4) Start Simal
By "avoidable" I meant you can work around it. I certainly didn't mean it
shouldn't be better documented and/or handled in the code - this is alpha
code remember.
The DOAP creator form in SVN creates records that will display correctly
and our live instance of Simal has >1500 projects with no such problem.
This is clearly a bug, it's marked as invalid because we can't hope to
reproduce it. We have never seen the problem you are describing and can't
reproduce it. If you are able to provide us with some sample data that will
reproduce this then we might be able to fix it. Without that there is no
hope of us finding it until we hit it ourselves.
As for your workaround in comment 5 I agree it is "amusing". I can think of
no sensible reason why that would resolve the issue. But again without
having the data you are importing we're kind of stuck.
Sorry we can't be more helpful on this one.
How do you import the projects? Is it from DOAP RDF/XML files?
Also I'm curious about the non-existing categories, and I noticed there's
an error in the query you post above. The query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE {}
actually doesn't return anything because you're not specifying what you
want returned. What you could do if you want to retrieve is say "give me
everything with a categoryId" which would be something like :
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category WHERE { ?category simal:categoryId ?catId}
There are known problems with categories in general that might be related
to this, eg. categories are sometimes created as type doap:category, which
is syntactically incorrect (Issue 283).
If you're hitting problems, please post one typical RDF/XML file so I can
see if that's what I'd expect Simal to handle correctly. Also, if you can
check the type of the categories in your data you can use this query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX simal: <http://oss-watch.ac.uk/ns/0.2/simal#>
SELECT DISTINCT ?category ?type WHERE {
?category simal:categoryId ?catId .
?category rdf:type ?type
}
If you post the result here we could identify if it's related to a known
issue.