Invalid gzip files

1,373 views
Skip to first unread message

Michael Meyers

unread,
Sep 19, 2014, 10:08:25 PM9/19/14
to spray...@googlegroups.com
Hi,

I'm using spray routing 1.3.1 for a rest service that returns a compressed response.  I'm using the compressResponse() directive.  Sometimes the zipped file is valid but other times I get an error like this when I try to decompress it from 7-zip: "CRC failed file is broken".  I do not see any errors in my log files.  I'm not sure how to go about finding the issue.  Has anyone else had this issue?  Where should I look to help figure it out?

Johannes Rudolph

unread,
Sep 20, 2014, 5:07:05 AM9/20/14
to spray...@googlegroups.com
Hi Michael,

in the best case you could provide us code that reproduces the
problem. Otherwise, what maybe help as well is providing us of an
example of the original file together with its gzip version that fails
to be decompressed. What error does gunzip report?

Johannes
> --
> You received this message because you are subscribed to the Google Groups
> "spray.io User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spray-user+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/spray-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/spray-user/9bea1c23-2627-42d6-abaa-d07237fef4a6%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Michael Meyers

unread,
Sep 20, 2014, 7:27:01 AM9/20/14
to spray...@googlegroups.com
gunzip says:
gzip: file.json.gz: invalid compressed data--crc error

gzip: file.json.gz: invalid compressed data--length error


The unzipped file is readable, but characters are all in the wrong places like this: "fie"ld: "abc"fed.  Unfortunately I can't send you the file as it contains business sensitive data.  It could be related to the size of the file.  The uncompressed file is 13 GB. and the compressed file is about 160 mb.  I tried with a smaller data set and the zip file was ok, but I know it has worked in the past with the full file as well, so I'm not sure it's size related.

Here's the route I'm using: (any syntax errors are a result of obfuscation)
val routes = sealRoute {
    authenticate(BasicAuth(realm = "service", createUser = extractUser _)) { user =>
        path("get_data" / Segment) { (date) =>
          compressResponse() {
            parameters('DataType) {
              (dataType) =>
                actorRefFactory.actorOf(Props(new Actor(date, dataType)))

Michael Meyers

unread,
Sep 20, 2014, 7:46:49 AM9/20/14
to spray...@googlegroups.com
There's actually 2 actors that get created from the route.  One is reading from the database and transforming the rows into json strings.  It then sends the output to a second actor who sends is responsible for sending things to the client.  What happens is we pull from the database at the rate the client is reading the stream.  We're doing that so we don't blow out memory if a client doesn't read the stream fast enough.

Here is the code for the buffer actor:


import akka.actor._
import com.typesafe.config.ConfigFactory
import com.typesafe.scalalogging.slf4j.LazyLogging
import spray.can.Http
import spray.http.MediaTypes._
import spray.http._

import scala.collection.mutable

class Buffer(databaseReader: ActorRef, responder: ActorRef) extends Actor with LazyLogging {

 
  responder
! ChunkedResponseStart(HttpResponse(entity = HttpEntity(`application/json`, " [ ")))
   
.withAck(AckReceived)

 
var ready = false
 
var inputStreamComplete = false
  val buf
= new mutable.Queue[MessageChunk]()

  val config
= ConfigFactory.load()
  val maxBufferSize
= config.getInt("max-buffer-size")

 
for (x <- 1 to maxBufferSize)
 
{
    databaseReader
! RequestBatch
 
}

 
def dequeueAndSend() = {
   
if (ready && buf.length > 0) {
        responder
! buf.dequeue().withAck(AckReceived)
        ready
= false
   
}
 
}

   
private def sendFinal(): Unit = {
        responder
! MessageChunk("]").withAck(AckReceivedFinal)
   
}

 
def receive = {
   
case AckReceived =>
      ready
= true
     
if (!inputStreamComplete && buf.length < maxBufferSize) {
          databaseReader
! RequestBatch
     
}
     
if (buf.length > 0) {
            dequeueAndSend
()
     
}
     
else if (inputStreamComplete && buf.length == 0) {
          sendFinal
()
     
}

   
case AckReceivedFinal =>
      logger
.debug("Request Complete")
      responder
! ChunkedMessageEnd

   
case msg: String =>
      databaseReader
! RequestBatch
      buf
.enqueue(MessageChunk(msg))
      dequeueAndSend
()

   
case ev: Http.ConnectionClosed =>
      logger
.warn("Stopping response streaming due to {}", ev)
      context
.stop(self)

   
case StreamComplete =>
        logger
.debug("Stream complete")
        inputStreamComplete
= true
       
if (buf.length == 0) {
            sendFinal
()
       
} else {
            dequeueAndSend
()
       
}
   
case _ =>
          logger
.error("Unrecognized message")
 
}

}

Mathias Doenitz

unread,
Sep 20, 2014, 3:18:14 PM9/20/14
to spray...@googlegroups.com
Michael,

you appear to be creating an per-request actor.
In that case you need to make sure that you stop this actor *in all cases*, even in the presence of errors.
You might already be doing that, just pointing it out.

Concerning the compression error:
It’s really hard for us to reproduce the problem if you do not provide some kind of SSCCE (http://sscce.org/).
If you can provide one we’d be happy to look into the issue.

Cheers,
Mathias

---
mat...@spray.io
http://spray.io
> To view this discussion on the web visit https://groups.google.com/d/msgid/spray-user/7a1dd269-2ddd-4b1f-8a4c-4a131c330f01%40googlegroups.com.

Michael Meyers

unread,
Sep 20, 2014, 8:16:34 PM9/20/14
to spray...@googlegroups.com
I'll see if I can find a way to reproduce it.  That's what I'm working on now.  I was just hoping someone would have an idea of what could cause it.

Michael Meyers

unread,
Sep 21, 2014, 12:41:59 AM9/21/14
to spray...@googlegroups.com
I still can't pinpoint the exact problem to be able to create a small program that replicates the issue, but it seems to be related to the size of the message chunk.  When I reduced it down I see it working.

Johannes Rudolph

unread,
Sep 29, 2014, 6:45:06 AM9/29/14
to spray...@googlegroups.com
Hi Michael,

we are currently moving the encoding code to akka-http and we'd like
to fix bugs if possible. Now, it's not completely sure, if this is a
spray bug, but it seems likely because corrupting the CRC can only
happen between the encoder and the network, so it would be cool if we
could track it down.

So, following up to your previous messages:

On Sun, Sep 21, 2014 at 6:41 AM, Michael Meyers <mrmey...@gmail.com> wrote:
> I still can't pinpoint the exact problem to be able to create a small
> program that replicates the issue, but it seems to be related to the size of
> the message chunk. When I reduced it down I see it working.

What were the sizes of the message chunks that failed and what the
ones that succeeded? Could you reproduce the issue at some point
deterministically?

>>> > The unzipped file is readable, but characters are all in the wrong
>>> > places like this: "fie"ld: "abc"fed. Unfortunately I can't send you the
>>> > file as it contains business sensitive data. It could be related to the
>>> > size of the file. The uncompressed file is 13 GB. and the compressed file
>>> > is about 160 mb. I tried with a smaller data set and the zip file was ok,
>>> > but I know it has worked in the past with the full file as well, so I'm not
>>> > sure it's size related.

I wonder how the data is exactly garbled. Is there a pattern of how
big the blocks of data are that were rearranged? Is data missing or
just reordered?
Reply all
Reply to author
Forward
0 new messages