Updating beans - duplicate properties

0 views
Skip to first unread message

uoccou

unread,
Jul 5, 2010, 6:36:02 AM7/5/10
to jenabean-dev
Hi,

Ive got something strange going on, and Im starting to bang my head.

I have a bean that I persist to an SDB backed model in one thread in a
web application, everything works as expected.

A few mins later I retrieve this bean in another thread with the help
of a sparql/exec query. I update a few properties (date time audit
fields and such) and then to a bean2rdf save.

If I go to update that persisted bean again by retrieving it from
storage using the same query instead of getting one bean back I now
get 2. Both have the same ID and differ by the properties I have
updated.

Whats going on ?

Taylor Cowan

unread,
Jul 5, 2010, 4:04:31 PM7/5/10
to jenabe...@googlegroups.com
Two beans with the same ID but with different properties?

Lacking the bean code I'll try some ideas that most likely you've
already covered. First, if a bean is without an @Id annotation,
jenabean by default uses the hashcode to create an id...that'll never
ever work for useful work, just a convenience for writing unit tests.
So in that situation (no @Id field), you could have two beans an
unanotated id that is the same, but the individuals have different
URLS.

Have you tried some SPARQL to just look at the raw triples and make
sure there aren't two individuals?

> --
> You received this message because you are subscribed to the Google Groups "jenabean-dev" group.
> To post to this group, send email to jenabe...@googlegroups.com.
> To unsubscribe from this group, send email to jenabean-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jenabean-dev?hl=en.
>
>

uoc

unread,
Jul 6, 2010, 3:37:14 AM7/6/10
to jenabe...@googlegroups.com, Taylor Cowan
Well, its peculiar and Im not sure what exactly is happening, but Ive worked around it. Maybe you can see whats going on :

Sparql shows one individual with duplicate properties - that is properties of the same name with different values attached to the same subject.
For example

The object model is a Class with a status property. Its not a collection property, but I've ended up with 2 status properties.

1) Create an individual with Status pending. Persist to database.
2) Query the database for the individual. Retrieve it. Update the status.
3) Query the database again. Get what appear to be 2 individuals, but are in fact one individual with duplicated status properties.

Both the Sparql and wading in with SQL to the innards of the tables show this.

I had been using an Ontology backed model. I changed this to a non-onotology backed model, and created an update method that directly uses the model rather than the "optimised" save method which uses the SDB model merge/add bulk update technique. Im not 100% sure (I dont have the luxury of time atm) but I believe both these things are needed.

I dont understand how updating an individual from the Ontology backed model (in step 2) would have caused new physical triples to be created.  The ontology backed model is at the heart of it for sure. It warrants more investigation, but at least I can isolate it for now.

Taylor Cowan

unread,
Jul 6, 2010, 8:37:37 AM7/6/10
to jenabe...@googlegroups.com
Well, this sounds more likely now, that is, one individual with two
values for the same property. It takes a lot of work to prevent that
in jena. I wished that the ontModel class would take into account
"functional" properties and see to it that this never happens.

JenaBean first deletes all relationships for the given singular
property, then adds the new value, but this is within context of the
singular model for which the Bean2RDF object is bound. If there are
two models merged this might happen...and I think you've uncovered a
weakness in the merge technique.

Taylor

uoc

unread,
Jul 6, 2010, 9:27:36 AM7/6/10
to jenabe...@googlegroups.com, Taylor Cowan
Thanks - if/when I do some more investigation Ill post here.

uoc

unread,
Jul 7, 2010, 7:30:49 AM7/7/10
to jenabe...@googlegroups.com
think the problem is me : the class im updating isnt in my current
ontology, so with open world logic, adding multiple properties of same
type is reasonable I suspect.....

On 06/07/2010 13:37, Taylor Cowan wrote:

uoc

unread,
Jul 10, 2010, 5:18:58 AM7/10/10
to jenabe...@googlegroups.com, Taylor Cowan
Well. Drag.

I removed the ontology from the equation and the problem remains.

So Ive reduced it to a test case, I hope you can try and reproduce it
and/or tell me what Im doing wrong, and/or that my expectation is wrong.

1) A simple class called DupeTarget with a String id and a date field.
2) a test driver - RawDupeTestSDB - that creates
2.1) 3 of these and Bean2RDF saves to an SDB backed model on MySQL using
an interim model and does an SDBModel.add(model)
2.2) 3 of these and Bean2RDF saves to an SDB backed model on MySQL using
just the sdb model

The idea is that 2.1 is quite a bit faster.
The expectation is that because the ID is the same there will be 3
triples for each ID - for type, name and created properties.
The result is that there are 3 triples for the 2.2 case, and a growing
number of triples for the 2.1 case - the created field is duplicated
many times for each of its values.


package mytest;

import java.util.Date;
import thewebsemantic.Id;
import thewebsemantic.Namespace;

@Namespace("http://testdupe#")
public class DupeTarget {

Date created = new Date();
String name = "WatchMeGrow";

public DupeTarget(Date created, String name) {
super();
this.created = created;
this.name = name;
}

public DupeTarget(){

}
public DupeTarget(String n){
this.name = n;
}
public Date getCreated() {
return created;
}
public void setCreated(Date created) {
this.created = created;
}
@Id
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}


package mytest;

import thewebsemantic.Bean2RDF;
import com.hp.hpl.jena.rdf.model.Resource;

import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.sdb.SDBFactory;
import com.hp.hpl.jena.sdb.Store;
import com.hp.hpl.jena.sdb.StoreDesc;
import com.hp.hpl.jena.sdb.sql.JDBC;
import com.hp.hpl.jena.sdb.sql.SDBConnection;
import com.hp.hpl.jena.sdb.store.DatabaseType;
import com.hp.hpl.jena.sdb.store.LayoutType;

public class RawDupeTestSDB {


public static void main(String[] args) throws InterruptedException {
RawDupeTestSDB tester = new RawDupeTestSDB();
tester.test();
}
public void test() throws InterruptedException {

StoreDesc storeDesc = new
StoreDesc(LayoutType.LayoutTripleNodesIndex, DatabaseType.MySQL) ;
JDBC.loadDriverMySQL() ;
String jdbcURL = "jdbc:mysql://elvis:3306/cheeseburgers";
SDBConnection conn = new SDBConnection(jdbcURL,
"elvis","hungry") ;
Store store = SDBFactory.connectStore(conn, storeDesc) ;

Model SDBModel = SDBFactory.connectDefaultModel(store);

//interim model and SDBModel.add saves - fast, but dangerous
DupeTarget o = new DupeTarget();
Thread.sleep(100);//ensure a "new" date field
DupeTarget o2 = new DupeTarget();
Thread.sleep(100);
DupeTarget o3 = new DupeTarget();

save(SDBModel, o);
save(SDBModel, o2);
save(SDBModel, o3);

//direct to model saves, slow
DupeTarget o4 = new DupeTarget("StaySlim");
Thread.sleep(100);
DupeTarget o5 = new DupeTarget("StaySlim");
Thread.sleep(100);
DupeTarget o6 = new DupeTarget("StaySlim");

saveDirect(SDBModel, o4);
saveDirect(SDBModel, o5);
saveDirect(SDBModel, o6);

store.getConnection().close() ;
store.close() ;


}

private void save(Model SDBModel, Object o) {
Model defMod = ModelFactory.createDefaultModel();
Bean2RDF writer = new Bean2RDF( defMod );
Resource r = writer.save(o);
SDBModel.add(defMod);
}
private void saveDirect(Model SDBModel, Object o) {

Bean2RDF writer = new Bean2RDF( SDBModel );
Resource r = writer.save(o);

}


}

Some SPARQL to see the triples:

PREFIX lard: <http://testdupe#>
SELECT *
WHERE { ?s a lard:DupeTarget> .
?s ?p ?o
}
order by ?s


On 06/07/2010 13:37, Taylor Cowan wrote:

Reply all
Reply to author
Forward
0 new messages