Fwd: Re: Updating beans - duplicate properties

1 view
Skip to first unread message

uoc

unread,
Jul 10, 2010, 8:15:20 AM7/10/10
to jenabe...@googlegroups.com
Resending as it doesnt seem to have made it to the list - (unless someone can tell me different ?)

-------- Original Message --------
Message-ID: <4C383B02...@gmail.com>
Date: Sat, 10 Jul 2010 10:18:58 +0100
From: uoc <uoc...@gmail.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5
MIME-Version: 1.0
To: jenabe...@googlegroups.com
CC: Taylor Cowan <thewebs...@gmail.com>
Subject: Re: Updating beans - duplicate properties
References: <8a5bfa03-4139-4ba0...@w31g2000yqb.googlegroups.com> <AANLkTikbof8ZGm9M_N6yN...@mail.gmail.com> <4C32DD2A...@gmail.com> <AANLkTin8RMPCYgGc_Hgje...@mail.gmail.com>
In-Reply-To: <AANLkTin8RMPCYgGc_Hgje...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


Well. Drag.

I removed the ontology from the equation and the problem remains.

So Ive reduced it to a test case, I hope you can try and reproduce it 
and/or tell me what Im doing wrong, and/or that my expectation is wrong.

1) A simple class called DupeTarget with a String id and a date field.
2) a test driver -  RawDupeTestSDB - that creates
2.1) 3 of these and Bean2RDF saves to an SDB backed model on MySQL using 
an interim model and does an SDBModel.add(model)
2.2) 3 of these and Bean2RDF saves to an SDB backed model on MySQL using 
just the sdb model

The idea is that 2.1 is quite a bit faster.
The expectation is that because the ID is the same there will be 3 
triples for each ID - for type, name and created properties.
The result is that there are 3 triples for the 2.2 case, and a growing 
number of triples for the 2.1 case - the created field is duplicated 
many times for each of its values.


package mytest;

import java.util.Date;
import thewebsemantic.Id;
import thewebsemantic.Namespace;

@Namespace("http://testdupe#")
public class DupeTarget {

    Date created = new Date();
    String name = "WatchMeGrow";

    public DupeTarget(Date created, String name) {
        super();
        this.created = created;
        this.name = name;
    }

    public DupeTarget(){

    }
    public DupeTarget(String n){
        this.name = n;
    }
    public Date getCreated() {
        return created;
    }
    public void setCreated(Date created) {
        this.created = created;
    }
    @Id
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
}


package mytest;

import thewebsemantic.Bean2RDF;
import com.hp.hpl.jena.rdf.model.Resource;

import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.sdb.SDBFactory;
import com.hp.hpl.jena.sdb.Store;
import com.hp.hpl.jena.sdb.StoreDesc;
import com.hp.hpl.jena.sdb.sql.JDBC;
import com.hp.hpl.jena.sdb.sql.SDBConnection;
import com.hp.hpl.jena.sdb.store.DatabaseType;
import com.hp.hpl.jena.sdb.store.LayoutType;



public class RawDupeTestSDB {


    public static void main(String[] args) throws InterruptedException  {
        RawDupeTestSDB tester = new RawDupeTestSDB();
        tester.test();
    }
    public void test() throws InterruptedException {

        StoreDesc storeDesc = new 
StoreDesc(LayoutType.LayoutTripleNodesIndex, DatabaseType.MySQL) ;
        JDBC.loadDriverMySQL() ;
        String jdbcURL = "jdbc:mysql://elvis:3306/cheeseburgers";
        SDBConnection conn = new SDBConnection(jdbcURL, 
"elvis","hungry") ;
        Store store = SDBFactory.connectStore(conn, storeDesc) ;

        Model SDBModel = SDBFactory.connectDefaultModel(store);

        //interim model and SDBModel.add saves - fast, but dangerous
        DupeTarget o = new DupeTarget();
        Thread.sleep(100);//ensure a "new" date field
        DupeTarget o2 = new DupeTarget();
        Thread.sleep(100);
        DupeTarget o3 = new DupeTarget();

        save(SDBModel, o);
        save(SDBModel, o2);
        save(SDBModel, o3);

        //direct to model saves, slow
        DupeTarget o4 = new DupeTarget("StaySlim");
        Thread.sleep(100);
        DupeTarget o5 = new DupeTarget("StaySlim");
        Thread.sleep(100);
        DupeTarget o6 = new DupeTarget("StaySlim");

        saveDirect(SDBModel, o4);
        saveDirect(SDBModel, o5);
        saveDirect(SDBModel, o6);

        store.getConnection().close() ;
         store.close() ;


    }

    private void save(Model SDBModel, Object o) {
        Model defMod = ModelFactory.createDefaultModel();
        Bean2RDF writer = new Bean2RDF( defMod );
        Resource r = writer.save(o);
        SDBModel.add(defMod);
    }
    private void saveDirect(Model SDBModel, Object o) {

        Bean2RDF writer = new Bean2RDF( SDBModel );
        Resource r = writer.save(o);

    }


}

Some SPARQL to see the triples:

PREFIX lard: <http://testdupe#>
SELECT *
WHERE { ?s a lard:DupeTarget> .
    ?s ?p ?o
}
order by ?s


On 06/07/2010 13:37, Taylor Cowan wrote:
> Well, this sounds more likely now, that is, one individual with two
> values for the same property. It takes a lot of work to prevent that
> in jena.  I wished that the ontModel class would take into account
> "functional" properties and see to it that this never happens.
>
> JenaBean first deletes all relationships for the given singular
> property, then adds the new value, but this is within context of the
> singular model for which the Bean2RDF object is bound.  If there are
> two models merged this might happen...and I think you've uncovered a
> weakness in the merge technique.
>
> Taylor
>
> On Tue, Jul 6, 2010 at 2:37 AM, uoc<uoc...@gmail.com>  wrote:
>    
>> Well, its peculiar and Im not sure what exactly is happening, but Ive worked
>> around it. Maybe you can see whats going on :
>>
>> Sparql shows one individual with duplicate properties - that is properties
>> of the same name with different values attached to the same subject.
>> For example
>>
>> Subject = http://myurl#indiv/1
>> Property=http://myurl/status
>> Object=http://myurl/Status/Pending
>>
>> Subject = http://myurl#indiv/1
>> Property=http://myurl/status
>> Object=http://myurl/Status/Active
>>
>> The object model is a Class with a status property. Its not a collection
>> property, but I've ended up with 2 status properties.
>>
>> 1) Create an individual with Status pending. Persist to database.
>> 2) Query the database for the individual. Retrieve it. Update the status.
>> 3) Query the database again. Get what appear to be 2 individuals, but are in
>> fact one individual with duplicated status properties.
>>
>> Both the Sparql and wading in with SQL to the innards of the tables show
>> this.
>>
>> I had been using an Ontology backed model. I changed this to a non-onotology
>> backed model, and created an update method that directly uses the model
>> rather than the "optimised" save method which uses the SDB model merge/add
>> bulk update technique. Im not 100% sure (I dont have the luxury of time atm)
>> but I believe both these things are needed.
>>
>> I dont understand how updating an individual from the Ontology backed model
>> (in step 2) would have caused new physical triples to be created.  The
>> ontology backed model is at the heart of it for sure. It warrants more
>> investigation, but at least I can isolate it for now.
>>
>> On 05/07/2010 21:04, Taylor Cowan wrote:
>>
>> Two beans with the same ID but with different properties?
>>
>> Lacking the bean code I'll try some ideas that most likely you've
>> already covered.  First, if a bean is without an @Id annotation,
>> jenabean by default uses the hashcode to create an id...that'll never
>> ever work for useful work, just a convenience for writing unit tests.
>> So in that situation (no @Id field), you could have two beans an
>> unanotated id that is the same, but the individuals have different
>> URLS.
>>
>> Have you tried some SPARQL to just look at the raw triples and make
>> sure there aren't two individuals?
>>
>>
>>
>> On Mon, Jul 5, 2010 at 5:36 AM, uoccou<uoc...@googlemail.com>  wrote:
>>
>>
>> Hi,
>>
>> Ive got something strange going on, and Im starting to bang my head.
>>
>> I have a bean that I persist to an SDB backed model in one thread in a
>> web application, everything works as expected.
>>
>> A few mins later I retrieve this bean in another thread with the help
>> of a sparql/exec query. I update a few properties (date time audit
>> fields and such) and then to a bean2rdf save.
>>
>> If I go to update that persisted bean again by retrieving it from
>> storage using the same query instead of getting one bean back I now
>> get 2. Both have the same ID and differ by the properties I have
>> updated.
>>
>> Whats going on ?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "jenabean-dev" group.
>> To post to this group, send email to jenabe...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> jenabean-dev...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/jenabean-dev?hl=en.
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "jenabean-dev" group.
>> To post to this group, send email to jenabe...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> jenabean-dev...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/jenabean-dev?hl=en.
>>
>>      
>    

Taylor Cowan

unread,
Jul 10, 2010, 11:36:51 PM7/10/10
to jenabe...@googlegroups.com
I received your latest post.

Taylor Cowan

unread,
Jul 10, 2010, 11:48:05 PM7/10/10
to jenabe...@googlegroups.com
a.add( b ) adds all statements from model b to a.  So if we have these triples:

model a:
:subject1 :hasName "Mark"

model b:
:subject1 :hasName "Joe"

after the add model a will have:
subject1 :hasName "Mark"
subject1 :hasName "Joe"

It's a union, so the model.add(model) technique is ok for new data, but for changing a functional property...it won't work.

uoc

unread,
Jul 11, 2010, 6:04:41 AM7/11/10
to jenabe...@googlegroups.com, Taylor Cowan
hi - thanks -

So - what I want is a "merge" kind of thing ? I dont see such a capability, and I understood from the SDB documentation that this would count as a duplicate. And, its slow. Im off to harass the Jena team....
thanks for now.
Reply all
Reply to author
Forward
0 new messages