Hi Steve,
Could you talk a little more about how you're writing the file? I don't
think Kite will make the file visible until the entire file is closed to
avoid partial files. If you're not using Kite, then it's a little hard
for me to comment on what is happening that might cause this.
More inline...
On 04/28/2015 12:59 PM, Steve wrote:
> I have been testing the ability to execute a query using Impala or Hive
> against a table in which the data is stored in Avro, and the Avro file
> is being appended to during the query by a single thread.
>
> During one of my tests, the Hive query failed due to the following
> exception:
>
> Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException:
> Invalid sync!
>
> I repeated the Hive query several times during the append operation and
> also after it was completed. I received the same exception every time.
Looks like the data is getting corrupted, but reading a file while it is
being written shouldn't be the cause. It isn't surprising to get this
exception while a file is being written, but getting it afterward means
the file is corrupt.
> I also ran the same query using Impala and received the same exception.
Yeah, more confirmation that it is corrupted.
> I can see how this is possible while the avro file is being appended to,
> but it seems that it should not happen after the avro file is closed.
>
> How is it possible that that sync marker can become invalid during this
> scenario?
I think reading while appending to the file might just be a red herring.
I don't see how that would corrupt it.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.