Exception during journal recovery

64 views
Skip to first unread message

Winspear

unread,
Oct 22, 2013, 1:11:12 PM10/22/13
to jour...@googlegroups.com
I am using Journal IO in my server and my server crashed overnight and now I see the below exception when starting the journal and I can't seem to recover the records in the journal. 

Caused by: java.lang.NegativeArraySizeException
at journal.io.api.DataFileAccessor.readLocationData(DataFileAccessor.java:354)
at journal.io.api.DataFileAccessor.readNextLocationDetails(DataFileAccessor.java:180)
at journal.io.api.Journal.goToNextLocation(Journal.java:766)
at journal.io.api.Journal.recoveryCheck(Journal.java:864)
at journal.io.api.Journal.open(Journal.java:204)
at journal.io.api.JournalBuilder.open(JournalBuilder.java:212)

I attached couple of screenshots the first one in the debugger on the "DataFileAccessor.java:180" which shows the "next" location to be 61:0:0 with file position 88547011, when I traversed the journal file (see second attachment) the contents are bunch of '00'. There are some more records at the bottom of that file. 

Below is my code using the journal

public EmailJournal(String journalDir, int maxFileLength, int delay, MetricRegistry registry) throws IOException {
Preconditions.checkArgument(!Strings.isNullOrEmpty(journalDir));
Preconditions.checkArgument(maxFileLength > 0);
Preconditions.checkArgument(delay > 0);
Preconditions.checkArgument(registry != null);

journal = JournalBuilder.of(new File(journalDir))
.setPhysicalSync(true)
.setMaxFileLength(maxFileLength)
.setMaxWriteBatchSize(maxFileLength)
.open();
this.readTimer = registry.timer(name(EmailJournal.class, "readTimer"));
this.writeTimer = registry.timer(name(EmailJournal.class, "writeTimer"));
this.compactionTimer = registry.timer(name(EmailJournal.class, "compactionTimer"));
this.delay = delay;
}

public EmailJournal(String journalDir, MetricRegistry registry) throws IOException {
this(journalDir, DEFAULT_FILE_LENGTH, DEFAULT_DELAY, registry);
}

@Override
public String write(InputStream stream) {
Preconditions.checkArgument(stream != null);
try (Timer.Context ignored = writeTimer.time()) {
byte[] data = ByteStreams.toByteArray(stream);
Location location = journal.write(data, Journal.WriteType.SYNC);
String id = UUID.randomUUID().toString();
idToLocationMap.put(id, location);
return id;
} catch (Throwable t) {
Throwables.propagate(t);
}
return null;
}

@Override
public void delete(String key) {
Preconditions.checkArgument(key != null);
try {
Location location = idToLocationMap.get(key);
Preconditions.checkState(location != null);
journal.delete(location);
idToLocationMap.remove(key);
} catch (IOException e) {
Throwables.propagate(e);
}
}

@Override
public InputStream read(String key) {
Preconditions.checkArgument(key != null);
try (Timer.Context ignored = readTimer.time()) {
Location location = idToLocationMap.get(key);
byte[] data = journal.read(location, Journal.ReadType.SYNC);
return new ByteArrayInputStream(data);
} catch (IOException e) {
Throwables.propagate(e);
}
return null;
}

@Override
public Map<String, InputStream> redo() {
Map<String, InputStream> redoEmails = new HashMap<>();
try {
for (Location location : journal.redo()) {
String id = UUID.randomUUID().toString();
idToLocationMap.put(id, location);
InputStream is = read(id);
redoEmails.put(id, is);
}
} catch (Throwable t) {
Throwables.propagate(t);
}
return redoEmails;
}

@PreDestroy
protected void close() throws IOException {
logger.log(Level.INFO, "Closing journal.");
journal.compact();
journal.close();
}

@Override
public void start() throws Exception {
journalCompactor.scheduleWithFixedDelay(new JournalCompactTask(), delay, delay, TimeUnit.SECONDS);
}

@Override
public void stop() throws Exception {
journalCompactor.shutdownNow();
}

private class JournalCompactTask implements Runnable {

@Override
public void run() {
try (Timer.Context ignored = compactionTimer.time()) {
journal.compact();
logger.log(Level.FINE, "Done compacting journal");
} catch (IOException e) {
logger.log(Level.WARNING, "Error compacting journal {0}.", Throwables.propagate(e));
}
}
}


Appreciate any help!

Screen Shot 2013-10-22 at 12.54.41 PM.png
Screen Shot 2013-10-22 at 12.48.36 PM.png

m p

unread,
Oct 23, 2013, 3:13:32 AM10/23/13
to jour...@googlegroups.com
Hi, 

I think that you have the same problem that we had:

- this is our solution to the problem (I think it was already improved a bit from that time) - no v. clean but works - the idea is to delete the last entry (batch) if it wasn't flushed. 

thanks,
maciek

Sergio Bossa

unread,
Oct 23, 2013, 4:02:20 AM10/23/13
to jour...@googlegroups.com
Hi,

the two issues look the same indeed: going to have a look at the pull request.

Thanks,

Sergio B.
> --
> You received this message because you are subscribed to the Google Groups
> "Journal.IO" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to journalio+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Sergio Bossa
http://www.linkedin.com/in/sergiob

Winspear

unread,
Oct 23, 2013, 10:22:48 AM10/23/13
to jour...@googlegroups.com

I am not sure if its the same problem, this exception happens in the middle of the journal file, meaning there are more records after the record where the exception happens and the exception thrown is java.lang.NegativeArraySizeException.

When creating the journal I set physicalSync to true and every write with ReadSync.SYNC parameter to make sure that the record has been written to disk, as we need recovery guarantees if our server crashes. So this makes me wonder why a record in the middle of the journal could be corrupt or missing.

Thanks 

Sergio Bossa

unread,
Oct 23, 2013, 2:50:22 PM10/23/13
to jour...@googlegroups.com
On Wed, Oct 23, 2013 at 3:22 PM, Winspear <prathapa...@gmail.com> wrote:

> I am not sure if its the same problem, this exception happens in the middle
> of the journal file,

Yes, it doesn't happen during recovery, but could be *caused* by a bad recovery.

I'm working on a fix (based on the idea above by the other user), so
that during recovery the corrupted part of the data file will be
deleted: you'll probably lose records, but you'll keep consistent data
and you'll not run into runtime errors.

In order to test the fix, I suggest you to backup the corrupted data
files and try them with the patched version.

Sergio Bossa

unread,
Oct 24, 2013, 10:52:06 AM10/24/13
to jour...@googlegroups.com
Hi,

the fix has been pushed in the issue48 branch: could you give it a try
and verify it can recover your logs (by possibly causing data loss,
which is again the price to pay for such odd situation)?

Thanks,

Sergio B.

Prathap Adusumilli

unread,
Oct 24, 2013, 9:20:23 PM10/24/13
to jour...@googlegroups.com, jour...@googlegroups.com

Sure will try it tomorrow when I am back at work....thanks for the quick patch.
> You received this message because you are subscribed to a topic in the Google Groups "Journal.IO" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/journalio/FilGvUuW4IE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to journalio+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages