Jets3t 0.9.4 throws error while reading an empty file

59 views
Skip to first unread message

Rajat Jain

unread,
Oct 7, 2015, 11:37:22 PM10/7/15
to jets3t...@googlegroups.com
I created a hive table with empty file:

➜   s3cmd ls s3://<my_bucket>/tables/empty_table/
2015-10-08 03:29         0   s3://<my_bucket>/tables/empty_table/empty_file

using

create external table empty_table (a string) location 's3://<my_bucket>/tables/empty_table/'

Ran select count(*) from empty_table;

Threw the exception below when using jets3t 0.9.4. Doesn't happen with 0.8.1a.

Any ideas why?

2015-10-08 09:02:13,018 WARN [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: Received IOException while reading 'tables/empty_table/empty_file', attempting to reopen.
2015-10-08 09:02:13,020 ERROR [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Caught and rethrowing java.io.IOException: java.io.IOException: While processing file s3n://<my_bucket>/tables/empty_table/empty_file. Attempted read on closed stream.
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:368)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:128)
	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:45)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:124)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
Caused by: java.io.IOException: While processing file s3n://<my_bucket>;/tables/empty_table/empty_file. Attempted read on closed stream.
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.handleExceptionWhenReadNext(HiveContextAwareRecordReader.java:382)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:364)
	... 15 more
Caused by: java.io.IOException: Attempted read on closed stream.
	at org.apache.http.conn.EofSensorInputStream.isReadAllowed(EofSensorInputStream.java:110)
	at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:136)
	at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
	at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
	at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:203)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.DataInputStream.read(DataInputStream.java:100)
	at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
	... 15 more

2015-10-08 09:02:13,020 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing... 

James Murty

unread,
Oct 10, 2015, 8:23:22 AM10/10/15
to jets3t...@googlegroups.com
Hi Rajat,

I'm afraid I'm not sure what is going on here.

I have just tested various ways of reading data from an empty S3 object with JetS3t and the input stream methods work as expected, the only way I can trigger the error "Attempted read on closed stream." is by closing the stream and then reading it again. And this is what should happen.

I'm not sure why the older version of JetS3t works in this scenario. A lot has changed since 0.8.1, including updated versions of HttpClient with some changes in how response streams are handled. Maybe the old version of JetS3t, or HttpClient, was more forgiving of input stream mis-use that it should have been and now an error in the Hive code is triggering the exception.

You will need to follow this up with the Hive folks.

Regards,
James



--
You received this message because you are subscribed to the Google Groups "JetS3t Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jets3t-users...@googlegroups.com.
To post to this group, send email to jets3t...@googlegroups.com.
Visit this group at http://groups.google.com/group/jets3t-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages