Hi,
I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of using the file extension, it uses magic bits. This fixes problems where compressed files don't have the .gz extension.
Rather than a diff, here's the new openIn() function for ~/code/joshua/lib/t/berkeleylm-1.1.5/src/edu/berkeley/nlp/lm/io/IOUtils.java (you also need an "import java.util.zip.ZipException" at the top).
public static BufferedReader openIn(final File path) throws IOException {
InputStream is = null;
try {
is = new GZIPInputStream(getBufferedInputStream(path));
} catch (ZipException e) {
// That ate a byte, so renew the stream
is = getBufferedInputStream(path);
}
return new BufferedReader(getReader(is));
}
This is in Joshua.
matt