gzip checking

12 views
Skip to first unread message

Matt Post

unread,
Feb 20, 2015, 10:44:35 PM2/20/15
to berkeleyl...@googlegroups.com
Hi,

I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of using the file extension, it uses magic bits. This fixes problems where compressed files don't have the .gz extension.

Rather than a diff, here's the new openIn() function for ~/code/joshua/lib/t/berkeleylm-1.1.5/src/edu/berkeley/nlp/lm/io/IOUtils.java (you also need an "import java.util.zip.ZipException" at the top).

  public static BufferedReader openIn(final File path) throws IOException {                          

    InputStream is = null;                                                                           

    try {                                                                                            

      is = new GZIPInputStream(getBufferedInputStream(path));                                        

    } catch (ZipException e) {    

      // That ate a byte, so renew the stream                                                        

      is = getBufferedInputStream(path);                                                             

    }                                                                                                

    return new BufferedReader(getReader(is));                                                 

  }  


This is in Joshua.

matt
Reply all
Reply to author
Forward
0 new messages