Dropping 60K nodes takes a long time

46 views
Skip to first unread message

Ross

unread,
May 22, 2013, 9:32:25 AM5/22/13
to dex...@googlegroups.com
Hi there,

I'm attempting to delete around 60,000 nodes from a database of around 1.2 million nodes.  The nodes to delete are all nodes of  type1, type2 and type3.  (There are around 60K type1 nodes, and less than 200 for the other types.)  I'm using the following Groovy code, using the dexjava.jar library:

/** Ad hoc class to delete all data for certain node types. */
public class AdHocDelete {
public static void main(args) {
assert new File(args[0]).exists() && new File(args[1]).exists();
def types = [ "type1", "type2", "type3" ];
def dex, db, sess, graph;
try {
println "Load dex properties from $args[1]";
DexProperties.load(args[1]);
DexConfig cfg = new DexConfig();
dex = new Dex(cfg);
println "Opening dex db at $args[0]...";
db = dex.open(args[0], false);
println "Dex db opened.";
sess = db.newSession();
graph = sess.getGraph();
types.each { type ->
int typeId = graph.findType(type);
if(typeId != Type.InvalidType) {
def objs = graph.select(typeId);
println "Deleting $type nodes...";
graph.drop(objs);
objs.close();
println "Deleted all $type nodes.";
}
else {
println "$type is an invalid type.";
}
}
}
catch(ex) {
println "Something went wrong.";
ex.printStackTrace();
}
finally {
sess.close();
db.close();
dex.close();
}
println "Done."
}
}

I'm running this code and I see the output:

Load dex properties from [C:\...etc]
Opening dex db at [C:\...etc]
Dex db opened.
Deleting type1 nodes...

And now I have waited for more than an hour for this first set of nodes to be removed!  I am impatient, so did not wait for the operation to complete before writing this ;).  Is there something I can do that is more efficient?

Thanks
Ross


c3po.ac

unread,
May 22, 2013, 10:02:27 AM5/22/13
to dex...@googlegroups.com
Hi,

Dex is optimized for analytical operations and therefore not as fast for updates or deletes.
The delete operations may need to check for any related information that must be deleted too and that can be slow.

We think that it's working, but if you find any problem, please, contact us.

Best regards,

El dimecres 22 de maig de 2013 15:32:25 UTC+2, Ross va escriure:

Ross

unread,
May 22, 2013, 10:14:00 AM5/22/13
to dex...@googlegroups.com
Our main use of Dex is analytical/exploration/query based, so it is interesting to see your point that Dex is optimised for analytical operations.  I hadn't really considered this until now - I rarely have to update more than a couple of thousand nodes at a time. Anyway it has finished at last! I will have to learn to be more patient next time. :)

Thanks

Mark Nuzz

unread,
Apr 14, 2014, 6:39:30 PM4/14/14
to spar...@googlegroups.com, dex...@googlegroups.com
Apologies for bringing up an old post, but this is something that concerns me.  60,000 nodes is really not a huge number in terms of data.  Assuming that this takes one hour, then we're looking at 16 nodes per second?  That doesn't seem working to me.  

Should I expect the DB to be this slow for deleting data? 

c3po.ac

unread,
Apr 15, 2014, 4:27:20 AM4/15/14
to spar...@googlegroups.com, dex...@googlegroups.com
Hello Mark,

The data storage of Sparksee is like a vertical partitioning of data, which although great for lots of things, may not be as fast for certain operations. The time a delete takes depends on your actual data, the number & type of your attributes, their indexation, and also on having the recovery functionality enabled or not. For instance in these two examples:
  • Case 1: The nodes you are removing have 50 attributes, most of them are of String data type, Indexed and you have the recovery enabled.
  • Case 2: The nodes you are removing have only a couple of numeric Basic attributes and you don't even have the recovery enabled.
The delete times would be completely different. Therefore it is difficult for us to tell how would be the performance of the delete operation for you without knowing more about your data, sorry about that!

The best thing is to try with your own data. If you need a license for a bigger dataset, please contact Dàmaris. Also do not hesitate to contact us directly with more information about your data and we will try to give a more tailored answer.

Thanks,


El dimarts 15 d’abril de 2014 0:39:30 UTC+2, Mark Nuzz va escriure:
Reply all
Reply to author
Forward
0 new messages