ID pool assignment errors

115 views
Skip to first unread message

David Turnbull

unread,
Dec 18, 2012, 1:11:23 AM12/18/12
to aureliu...@googlegroups.com
Hi,

We're seeing new errors in the last ~30 hours with titan-0.2.0-20121217.011832.

Exception in thread "main" java.lang.IllegalArgumentException: -1
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.nextBlock(StandardIDPool.java:69)
at com.thinkaurelius.titan.graphdb.database.idassigner.StandardIDPool.nextID(StandardIDPool.java:115)
at com.thinkaurelius.titan.graphdb.database.idassigner.VertexIDAssigner.assignID(VertexIDAssigner.java:266)
at com.thinkaurelius.titan.graphdb.database.idassigner.VertexIDAssigner.assignID(VertexIDAssigner.java:153)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.assignID(StandardTitanGraph.java:620)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.registerNewEntity(AbstractTitanTx.java:121)
at com.thinkaurelius.titan.graphdb.vertices.factory.StandardVertexFactories$1.createNew(StandardVertexFactories.java:23)
at com.thinkaurelius.titan.graphdb.transaction.AbstractTitanTx.addVertex(AbstractTitanTx.java:133)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsTransaction.addVertex(TitanBlueprintsTransaction.java:47)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.addVertex(TitanBlueprintsGraph.java:100)

It seemed similar to the problem outlined in https://groups.google.com/forum/#!msg/aureliusgraphs/2Alt-XmfBRQ/h63j5CW4YXgJ so I added
'configuration.setProperty("storage.block-size", 100000)' to the argument to TitanFactory.open(Configuration). I also tried 1000000.
Partition '2' gets that value, but not partition '1', but I don't know if that's relevant.

DEBUG - Acquired ID block [120700001,120800001) on partition 2 (my rid is 0a010051313130333040706c2d6f66666963652d30312e6c6f63616c)
DEBUG - Acquired ID block [55480001,55490001) on partition 1 (my rid is 0a010051313130333040706c2d6f66666963652d30312e6c6f63616c)
...
...

WARN - Temporary storage exception while acquiring id block - retrying in 300 ms: com.thinkaurelius.titan.diskstorage.TemporaryStorageException: Wrote claim for id block [56640001, 56650001) in 955 ms => too slow, threshold is: 300

Could those temporary storage exceptions allow an ID to be acquired from an empty ID buffer?
Is there another way to fix this problem?

Thanks,
David

Matthias Broecheler

unread,
Dec 18, 2012, 1:19:54 AM12/18/12
to aureliu...@googlegroups.com
Hey David,

is that on a database that was created with a previous version of Titan? As explained in a previous email, titan 0.2 requires an upgrade to be used against an existing previous titan database due to changes in the id assignment.
Please run titan 0.2 ONLY INDEPENDENTLY for now. We will provide an upgrade script shortly.

Thanks,
Matthias


--





--
Matthias Broecheler, PhD
http://www.matthiasb.com
E-Mail: m...@matthiasb.com

David Turnbull

unread,
Dec 18, 2012, 2:01:07 AM12/18/12
to aureliu...@googlegroups.com
Yep I made sure of that :) I've been dropping the cassandra keyspace each run.

--
 
 

Matthias Broecheler

unread,
Dec 18, 2012, 2:26:43 PM12/18/12
to aureliu...@googlegroups.com

That is not good :-)
Did you make any changes to the config files? It looks like you're is acquisitions might be timing out causing titan to fail. Is this always happening at the same time?

--
 
 

Zack Maril

unread,
Dec 18, 2012, 2:50:29 PM12/18/12
to aureliu...@googlegroups.com
David,
I've had similar problems before it feels like. What I eventually learned was that there is a cache someplace else that cassandra stores information in. If you delete the keyspace manually (rm /data/cassandra or what have you), then you need to hunt down the cache and clear that as well. Info can stick around and gum up the works sometimes. 
Good luck! 
-Zack

David Turnbull

unread,
Dec 18, 2012, 6:55:18 PM12/18/12
to aureliu...@googlegroups.com
No cassandra or titan config changes.
The errors happen about 10 minutes into the map/reduce job, but each mapper has a different number of things that it has done when the exception is thrown.

I'm trying with cassandra on my laptop with SSD now, but that didn't seem to change things.
When I increased storage.idauthority-wait-time to 10000, that same exception was thrown almost instantly just on the initialisation of our titan types.

I also tried going back to 0.2.0-20120916.015549-1 but no help there.

--
 
 

David Turnbull

unread,
Dec 18, 2012, 6:56:52 PM12/18/12
to aureliu...@googlegroups.com
I found that dropping the keyspace before nuking the data dir while cassandra is shut down, cleans out all the nasty hidden stuff :)
This specific problem is persisting across different cassandra servers and keyspace names so I have to rule that out.

--
 
 

David Turnbull

unread,
Dec 18, 2012, 7:45:54 PM12/18/12
to aureliu...@googlegroups.com
Matthias,

Gerald Kaszuba suggested off-list reconnecting to titan periodically to preempt the ID buffer being empty.
That worked, with an obvious time cost though :)
I hope this gives you some helpful info!

David Turnbull

unread,
Dec 18, 2012, 10:22:03 PM12/18/12
to aureliu...@googlegroups.com
Update:
This is _not_ related to the most recent titan snapshot, as I went back to the September one and the problem was there, too.

It seems more like the number of mutations to perform per transaction.
With 100 items -> 100 getVertex calls, 100 createVertex calls, 100 addEdge calls and 1300 setProperties calls, the problem occurs.
Flushing after processing each item seems to have fixed the problem.

Matthias Broecheler

unread,
Dec 23, 2012, 11:27:47 AM12/23/12
to aureliu...@googlegroups.com
Ha, I think I finally figured out what is going on.
The ID Pool manager automatically allocates the next block of ids when the current one is about to run out. That way, there is no lag when going from an old to a new id pool - the new one is already waiting. However, "about to run out" is defined as "there are only 100 ids left". In your case, you probably blow through 100 ids in no time and so the id block allocation thread is still running when the ids run out. Now, the ID Pool manager waits for that thread, but only for 2 seconds. In your case, it seems that due to the high load on the cluster, some id allocations fail and hence the error.

Two things:
1) I just committed a change to master that triggers the new block allocation when there are only 20% of the ids left. So, if you set your id block size sufficiently large, you should be fine.
2) Increase the lock wait time for the id allocation to increase its chance of success.

Hope this fixes it.
Thanks for reporting,
Matthias
Reply all
Reply to author
Forward
0 new messages