orient prone to data loss over time

363 views
Skip to first unread message

Jamie

unread,
Mar 5, 2013, 1:09:19 AM3/5/13
to orient-...@googlegroups.com
Hi Lucas, et. al.

We have Orient embedded in a test product that has been distributed out in the field to a few customers. What we have found is that Orient DB is highly prone to data corruption.

Caused by: com.orientechnologies.orient.core.exception.ORecordNotFoundException: The record with id '#0:2' not found
at com.orientechnologies.orient.core.record.ORecordAbstract.reload(ORecordAbstract.java:266)
at com.orientechnologies.orient.core.record.impl.ODocument.reload(ODocument.java:367)
at com.orientechnologies.orient.core.type.ODocumentWrapper.reload(ODocumentWrapper.java:82)
at com.orientechnologies.orient.core.type.ODocumentWrapperNoClass.reload(ODocumentWrapperNoClass.java:67)
at com.orientechnologies.orient.core.index.OIndexManagerAbstract.load(OIndexManagerAbstract.java:100)
at com.orientechnologies.orient.core.index.OIndexManagerAbstract.load(OIndexManagerAbstract.java:52)
at com.orientechnologies.orient.core.metadata.OMetadata$2.call(OMetadata.java:121)
at com.orientechnologies.orient.core.metadata.OMetadata$2.call(OMetadata.java:112)
at com.orientechnologies.common.concur.resource.OSharedContainerImpl.getResource(OSharedContainerImpl.java:53)
... 32 common frames omitted
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #0:2 (cluster: internal)
at com.orientechnologies.orient.core.db.raw.ODatabaseRaw.read(ODatabaseRaw.java:239)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.executeReadRecord(ODatabaseRecordAbstract.java:586)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.reload(ODatabaseRecordAbstract.java:249)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.reload(ODatabaseRecordAbstract.java:72)
at com.orientechnologies.orient.core.record.ORecordAbstract.reload(ORecordAbstract.java:259)
... 40 common frames omitted
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Error on reading record from file 'default.0.oda', position 3541, size 550.28Mb: the record size is bigger then the file itself (6.40Kb). Probably the record is dirty due to a previous crash. It is strongly suggested to restore the database or export and reimport this one.
at com.orientechnologies.orient.core.storage.impl.local.ODataLocal.getRecord(ODataLocal.java:231)
at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.readRecord(OStorageLocal.java:1636)
at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.readRecord(OStorageLocal.java:976)
at com.orientechnologies.orient.core.db.raw.ODatabaseRaw.read(ODatabaseRaw.java:233)
... 44 common frames omitted

This happens after some days of running the application. I am not sure how to debug this. Are you sure that Orient is ready for prime time? For us, the database corrupts data after a period. Data loss is obviously unacceptable.

Regards

Jamie


Jamie

unread,
Mar 5, 2013, 1:27:27 AM3/5/13
to orient-...@googlegroups.com
My earlier comment is a little unfair to Lucas, et. al. It merely critical and doesn't give the opportunity to evaluate where the problem lies. 

Since the software is running in an outside environment, it is tough for me to determine the source of the problem.

To try to resolve the corruption, I've set:

OGlobalConfiguration.TX_LOG_SYNCH.setValue(true);
OGlobalConfiguration.TX_COMMIT_SYNCH.setValue(true);

Is there anything else I can do?

I just have about six or seven different customers that have problems with Orient DB corruption. Fortunately, we are only using Orient for indexing purposes and so we can simply rebuild the database. However, we are wanting to store valuable database inside of it, but these data corruption issues are preventing us from doing so.

Regards

Jamie

Luca Garulli

unread,
Mar 5, 2013, 2:02:41 AM3/5/13
to orient-database
Hi Jamie,
what release are you using? Do you kill the OrientDB process so often? Have you activated automatic backups? Are you running in embedded mode or the problem is in the Server?

Lvc@




Regards

Jamie

--
 
---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mproch

unread,
Mar 5, 2013, 4:42:58 AM3/5/13
to orient-...@googlegroups.com
If you kill orientDB the result can be as you described.

There is one more setting that can be used, but it make orientDB much slower: 
nonTX.recordUpdate.synch=true

I think that until some sort of journaling/WAL is implemented (I guess Andrey will be working on this soon)
there won't be guarantee of data not being corrupt (for comparison, see: http://docs.mongodb.org/manual/administration/journaling/)

thanks,
maciek

Artem Orobets

unread,
Mar 5, 2013, 5:56:24 AM3/5/13
to orient-...@googlegroups.com
Hi Jamie,

We started to work on write ahead log. It should solve all problems with durability. 

--
Best regards,
Artem Orobets

Sylvain Spinelli

unread,
Mar 5, 2013, 6:06:18 AM3/5/13
to orient-...@googlegroups.com
Hi,

If all writes operations (creates, updates, deletes) are done in
transactions, with :
OGlobalConfiguration.TX_LOG_SYNCH.setValue(true);
OGlobalConfiguration.TX_COMMIT_SYNCH.setValue(true);

Can you confirm that the db should never be corrupted even with random
"kill -9" ?

So, the "write ahead log" project is only useful in non secured
transaction mode, right ?


Sylvain

Andrey Lomakin

unread,
Mar 5, 2013, 12:40:19 PM3/5/13
to orient-database
Hi Sylvain,
We plan to support not only kill but power failure too.
And yes we plan to support non-TX mode only from begining, WAL is part of ARIES implementation of transactions in highly concurent envinroment with differen isolation levels but implementation of this approach is not in our roadmap now.


--

--- You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.





--
With best regards,
Andrey Lomakin


Harald Wellmann

unread,
Mar 5, 2013, 1:12:07 PM3/5/13
to orient-...@googlegroups.com
Does that mean the answer to Sylvain's question

> Can you confirm that the db should never be corrupted
> even with random "kill -9" ?

is No...?

I take the question to refer to the current 1.3.0 release and not to
future versions.

Best regards,
Harald


Am 05.03.2013 18:40, schrieb Andrey Lomakin:
> Hi Sylvain,
> We plan to support not only kill but power failure too.
> And yes we plan to support non-TX mode only from begining, WAL is part
> of ARIES implementation of transactions in highly concurent envinroment
> with differen isolation levels but implementation of this approach is
> not in our roadmap now.
>
>
> On Tue, Mar 5, 2013 at 1:06 PM, Sylvain Spinelli
> <sylvain....@gmail.com <mailto:sylvain....@gmail.com>> wrote:
>
> Hi,
>
> If all writes operations (creates, updates, deletes) are done in
> transactions, with :
> OGlobalConfiguration.TX_LOG___SYNCH.setValue(true);
> OGlobalConfiguration.TX___COMMIT_SYNCH.setValue(true);

Luca Garulli

unread,
Mar 5, 2013, 7:29:42 PM3/5/13
to orient-database
Hi,
every DBMS can be corrupted in case of kill -9 or system crash because all DBMSs uses several cache systems till the data is physically written to the disk: sw, os and I/O controller.

This is the reason why backups are always needed. By using a log the chance to recover the state is much higher and this is what we're supporting. The user could choice if use it or disable in case has a more reliable sw/hw configuration.

Lvc@



Andrey Lomakin

unread,
Mar 6, 2013, 2:06:16 AM3/6/13
to orient-database
Hi,
That is good description of durability in different DBs and its trade offs.


To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Andrey Lomakin

unread,
Mar 6, 2013, 2:06:31 AM3/6/13
to orient-database

Pablo Guerrero

unread,
Mar 6, 2013, 3:16:11 AM3/6/13
to orient-...@googlegroups.com
Hi Luca,

I understand that nothing can always protect you against a system crash, but I don't see it so clear in the case of an application crash (much more likely). Wouldn't a transactional log always protect you against an app crash if the OS is still running?

It's not clear to me if this transactional log is already implemented, or if it will be implemented in a future release.
Here, https://github.com/nuvolabase/orientdb/wiki/Transactions it talks about a LOG, but in previous messages it was stated that it's a work in progress.

Thanks,
Pablo


To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

Luca Garulli

unread,
Mar 6, 2013, 4:57:51 AM3/6/13
to orient-database
Hi Pablo,
the transaction log is in OrientDB since years. But we're thinking about using a log also for non-tx cases to preserve data even without ACID properties.

Lvc@

Pablo Guerrero

unread,
Mar 6, 2013, 6:34:57 AM3/6/13
to orient-...@googlegroups.com
Thanks, that will be really useful. Many times I don't need transactions, but I don't want to lose data either.

Jamie

unread,
Mar 6, 2013, 6:54:53 AM3/6/13
to orient-...@googlegroups.com
Lucas

Thanks for your reply. Please refer to the below.


On Tuesday, March 5, 2013 9:02:41 AM UTC+2, Lvc@ wrote:
Hi Jamie,
what release are you using?

We are using Orient 1.3.0. 
 
Do you kill the OrientDB process so often?

We never intentionally kill Orient. Orient is running in embedded mode. Our server shuts down cleanly. 

Its hard to say when the corruption happens since it seems to occur during the normal course of business.

My gut tells me that there is a bug in the Orient db that causes corruption over time.


Have you activated automatic backups?

How do you activate automatic backups in embedded mode? Which API do we use?
 
Are you running in embedded mode or the problem is in the Server?

We are running embedded. The Orient DB is shutdown.
 
Thanks 

Jamie

Luca Garulli

unread,
Mar 6, 2013, 7:36:28 AM3/6/13
to orient-database
What kind of corruption? Can you post the message here?

Lvc@


--

mat taylor

unread,
Mar 7, 2013, 10:39:39 AM3/7/13
to orient-...@googlegroups.com
Can you please comment on when you expect write ahead logging to be available?

Thanks,
Mat

Andrey Lomakin

unread,
Mar 8, 2013, 1:04:39 AM3/8/13
to orient-database
Hi,
We expect to implement it during next two months.
So that is beginning of May.


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Siriux

unread,
Mar 21, 2013, 3:15:28 AM3/21/13
to orient-...@googlegroups.com
Hi Andrey,

Is there any issue open to track this?

Cheers,
Pablo

Andrey Lomakin

unread,
Mar 21, 2013, 2:31:45 PM3/21/13
to orient-database

Pablo Guerrero

unread,
Mar 21, 2013, 2:48:53 PM3/21/13
to orient-database
Thanks

Valentin Popov

unread,
Oct 4, 2013, 10:57:11 AM10/4/13
to orient-...@googlegroups.com
version 1.5.1 embeded 

with 

OGlobalConfiguration.NON_TX_RECORD_UPDATE_SYNCH.setValue(true); //Executes a synch against the file-system at every record operation. This slows down records updates but guarantee reliability on unreliable drives
        OGlobalConfiguration.TX_LOG_SYNCH.setValue(true); //Executes a synch against the file-system for each log entry. This slows down transactions but guarantee transaction reliability on non-reliable drives

enabled 

Could you give some clue about that? 



2013-10-04 10:47:21.870 ERROR - Clear tx log entries failed
java.nio.channels.ClosedChannelException: null
   at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
   at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
   at com.orientechnologies.orient.core.storage.fs.OFileClassic.shrink(OFileClassic.java:48)
   at com.orientechnologies.orient.core.storage.impl.local.OSingleFileSegment.truncate(OSingleFileSegment.java:79)
   at com.orientechnologies.orient.core.storage.impl.local.OTxSegment.clearLogEntries(OTxSegment.java:182)
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocalTxExecuter.clearLogEntries(OStorageLocalTxExecuter.java:197)
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.commit(OStorageLocal.java:1307)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic$2.call(OTransactionOptimistic.java:131)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic$2.call(OTransactionOptimistic.java:127)
   at com.orientechnologies.orient.core.storage.OStorageAbstract.callInLock(OStorageAbstract.java:180)
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.callInLock(OStorageLocal.java:1103)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:127)
   at com.orientechnologies.orient.core.db.record.ODatabaseRecordTx.commit(ODatabaseRecordTx.java:114)
   at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:458)
   at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.commit(OrientTransactionalGraph.java:62)
   at com.stimulus.archiva.ew.b(MailArchiva:157)
   at com.stimulus.archiva.cs.a(MailArchiva:226)
   at com.stimulus.archiva.cs.run(MailArchiva:194)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
2013-10-04 10:47:21.870 ERROR - failed to write tree:Error during transaction commit.
com.orientechnologies.orient.core.exception.OStorageException: Error during transaction commit.
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.commit(OStorageLocal.java:1304)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic$2.call(OTransactionOptimistic.java:131)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic$2.call(OTransactionOptimistic.java:127)
   at com.orientechnologies.orient.core.storage.OStorageAbstract.callInLock(OStorageAbstract.java:180)
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocal.callInLock(OStorageLocal.java:1103)
   at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:127)
   at com.orientechnologies.orient.core.db.record.ODatabaseRecordTx.commit(ODatabaseRecordTx.java:114)
   at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:458)
   at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.commit(OrientTransactionalGraph.java:62)
   at com.stimulus.archiva.ew.b(MailArchiva:157)
   at com.stimulus.archiva.cs.a(MailArchiva:226)
   at com.stimulus.archiva.cs.run(MailArchiva:194)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException: null
   at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
   at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
   at com.orientechnologies.orient.core.storage.fs.OFileClassic.shrink(OFileClassic.java:48)
   at com.orientechnologies.orient.core.storage.impl.local.OSingleFileSegment.truncate(OSingleFileSegment.java:79)
   at com.orientechnologies.orient.core.storage.impl.local.OTxSegment.clearLogEntries(OTxSegment.java:182)
   at com.orientechnologies.orient.core.storage.impl.local.OStorageLocalTxExecuter.clearLogEntries(OStorageLocalTxExecuter.java:197)
   at com.orientechnologies.orien

вторник, 5 марта 2013 г., 10:09:19 UTC+4 пользователь Jamie написал:
Reply all
Reply to author
Forward
0 new messages