[groovy-user] Download zip file and process

1,064 views
Skip to first unread message

rakesh mailgroups

unread,
Dec 3, 2012, 10:11:27 AM12/3/12
to us...@groovy.codehaus.org
Hi All,

struggling to get all the pieces to work together:

I would like to download a zip file over http, and process it in memory, NOT via a file.

More details, I am trying to download a geoIP database made available as a zipped csv file: http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip

I then need to process each line and insert into my db.

I'm finding it quite difficult with all the ZipFile/ZipInputStream options as well as not wanting to store it on the file system first.

Any help appreciated.

Rakesh

Tim Yates

unread,
Dec 3, 2012, 10:27:44 AM12/3/12
to us...@groovy.codehaus.org
This grabs a zip file, and processes each entry without saving the zip file to disk first.

Obviously in this example, it just unpacks the zip to /tmp.  You would want to do something else with the stream zs once you have got the first entry.

Hope it helps!

import java.util.zip.*

"http://some.url/to/afile.zip".toURL().withInputStream { s ->
  new ZipInputStream( s ).with { zs ->
    while( zent = zs.nextEntry ) {
      println "Processing $entry.name"
      
      def local = new File( "/tmp/$entry.name" )

      // Do stuff with the entry
      if( zent.isDirectory() ) {
        local.mkdir()
      }
      else {
        local << zs
      }
      
      zs.closeEntry()
    }
  }
}

Andre Steingress

unread,
Dec 3, 2012, 10:32:30 AM12/3/12
to Groovy-User Groovy-User
Hi,

The first part is to fetch the file content with URL#bytes (assuming you are accessing a non-protected HTTP url). 

If you need an example on how to unzip a file in Groovy, you can have a look at Tim Yates Groovy Common Extensions project on Github:


The code there creates java.io.File instances, but this could be easily replaced with a map/a single instance of ByteArrayOutputStream.

Cheers, André

Andre Steingress

unread,
Dec 3, 2012, 10:34:25 AM12/3/12
to Groovy-User Groovy-User
I should have known that Tim Yates himself would answer this question ;-) Sorry for dbl posting the answer...

Cheers, André

rakesh mailgroups

unread,
Dec 5, 2012, 4:53:47 AM12/5/12
to us...@groovy.codehaus.org
Hi,

thanks but the issue seems to be the code that does NOT use a file. Any example code for that?

Thanks

Rakesh

Tim Yates

unread,
Dec 5, 2012, 5:33:47 AM12/5/12
to us...@groovy.codehaus.org
Not sure what you mean...  My example streams from a URLConnection

rakesh mailgroups

unread,
Dec 5, 2012, 5:53:33 AM12/5/12
to us...@groovy.codehaus.org
"The code there creates java.io.File instances, but this could be easily replaced with a map/a single instance of ByteArrayOutputStream."

That bit!

Tim Yates

unread,
Dec 5, 2012, 5:59:05 AM12/5/12
to us...@groovy.codehaus.org
So what do you want it to do?

Andre Steingress

unread,
Dec 5, 2012, 6:21:52 AM12/5/12
to Groovy-User Groovy-User
You should end up with something like this (assuming the ZIP file has only a single entry):

final bytes = // ... the zip file's byte[] 

final zipInput = new ZipInputStream(new ByteArrayInputStream(bytes))
final output = new ByteArrayOutputStream()
     
zipInput.withStream {
    def entry = zipInput.nextEntry
    
    int len = 0;
    byte[] buffer = new byte[4096]
    while ((len = zipInput.read(buffer)) > 0){
        output.write(buffer, 0, len);
    } 
}

I am currently underway and don't have access to my dev environment, but this example should give you a glimpse on how to solve this issue.

Cheers, André

Tim Yates

unread,
Dec 5, 2012, 6:42:35 AM12/5/12
to us...@groovy.codehaus.org
Ok, for this example I am going to grab the zip for v0.3 of my Common Groovy Extension source on Github.

I will then go through each entry looking for 'ObjectExtensionMethods.groovy'[1], and when I find this I will put it into a StringWriter which I will convert to a String at the end of the processing.

Hope it helps.

import java.util.zip.*


String fileContent = zipURL.toURL().withInputStream { s ->
  new ZipInputStream( s ).with { zs ->
    new StringWriter().with { sw ->
      while( zent = zs.nextEntry ) {
        if( entry.name.endsWith( 'ObjectExtensionMethods.groovy' ) ) {
          sw << zs
          break
        }
        zs.closeEntry()
      }
      sw.toString()
    }
  }
}

println fileContent

Tim

Tim Yates

unread,
Dec 5, 2012, 6:51:02 AM12/5/12
to us...@groovy.codehaus.org
Or, as Andre says, you can copy it from the stream to a byte buffer

    zipInput.eachByte( 4096 ) { buffer, len ->
      output.write( buffer, 0, len )
    } 

Is a shortcut to wrapping your own buffering input stream reader :-)

Tim
Reply all
Reply to author
Forward
0 new messages