file browser: upload does not work

330 views
Skip to first unread message

hennin...@googlemail.com

unread,
Aug 16, 2012, 7:41:06 AM8/16/12
to hue-...@cloudera.org
Hi,

i am using the hue version that ships with CDH4. If i try to upload a file, a progress indicator is running from 0 to 100 percent. After that, the upload window disapears without any error message. But the file does not exist in the selected upload folder. It does exist in hdfs under /tmp/hue-uploads/tmp.10.20.104.27.137394205741618389.

The temp file belongs to user 'hdfs'. The target folder belongs to my user_id. Maybe i did not fullfill all the requirements to enable the upload?

Regards
Henning

Romain Rigaux

unread,
Aug 16, 2012, 12:14:24 PM8/16/12
to hennin...@googlemail.com, hue-...@cloudera.org
Hi,

I know there is https://issues.cloudera.org/browse/HUE-705 (no error message displayed when the upload fails).

It is still probably a permissions problem:
Do you have the Hue logs? (particularly the server ones which will shoe the calls like 'http_client  DEBUG    GET http://localhost:14000/webhdfs/v1/.....')
You can get more information also in the HDFS logs:
/var/log/hadoop-httpfs/httpfs.log
/var/log/hadoop-hdfs/*.log

Romain

hennin...@googlemail.com

unread,
Aug 17, 2012, 7:31:26 AM8/17/12
to hue-...@cloudera.org, hennin...@googlemail.com
There are only relevant logs in /var/log/hadoop-hdfs/hadoop-cmf-hdfs1-DATANODE-hostname.log.out. How to solve?

2012-08-17 13:01:38,623 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.20.108.24:50010], original=[10.20.108.24:50010])
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
2012-08-17 13:01:38,623 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hue-uploads/tmp.10.20.104.27.5069608749590602606
java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.20.108.24:50010], original=[10.20.108.24:50010])
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
2012-08-17 13:01:38,623 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) via hue (auth:SIMPLE) cause:java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.20.108.24:50010], original=[10.20.108.24:50010])

Gerardo Vázquez Rodríguez

unread,
Aug 17, 2012, 12:59:13 PM8/17/12
to hennin...@googlemail.com, hue-...@cloudera.org
Hi,

Check your HDFS replica parameter, it seems that you have less nodes available to write than your replica value.

Kind Regards

Romain Rigaux

unread,
Aug 17, 2012, 9:23:29 PM8/17/12
to Gerardo Vázquez Rodríguez, hennin...@googlemail.com, hue-...@cloudera.org

Indeed, if using the pseudo distributed mode you should have only one replica:

e.g.

/etc/hadoop/conf/hdfs-site.xml

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>

And by the way https://issues.cloudera.org/browse/HUE-705 was fixed today,

Romain

hennin...@googlemail.com

unread,
Sep 21, 2012, 11:26:46 AM9/21/12
to hue-...@cloudera.org, Gerardo Vázquez Rodríguez, hennin...@googlemail.com
Hi,

sorry for late answer


Am Samstag, 18. August 2012 03:23:29 UTC+2 schrieb Romain Rigaux:

Indeed, if using the pseudo distributed mode you should have only one replica:


I am using a test installation with just one node. I am not using "standalone mode", so apparently i am using "pseudo distributed mode", right?
Ok, i can confirm, that my problem is solved by setting the replica factor to one. But i don't understand the reason.

With replication factor set to 3 i am able to "file-browser-upload" a file of 13KB.
With replication factor set to 3 i am NOT able to "file-browser-upload" a file of 4.5MB. (error still the same as i posted before).
With replication factor set to 1 i am able to "file-browser-upload" a file of 4.5MB.
With replication factor set to 3 i am able to "hadoop fs -copyFromLocal" a file of 4.5MB

If uploading is possible on command line, it should also be possible using the file browser, right?

Small question appart: Is the only difference between a cluster in "pseudo distributed mode" and a "real cluster", that the former is using only one node? So a "pseudo distributed cluster" turns automatically to a "real cluster" if i just add an additional node running only datanode and tasktracker? Or are there further configuration differences?

Best regards
Henning

Romain Rigaux

unread,
Sep 24, 2012, 5:20:44 PM9/24/12
to hennin...@googlemail.com, hue-...@cloudera.org, Gerardo Vázquez Rodríguez
Hi Henning,

I think you might be hitting an uploading bug that was fixed last week and will be coming in this week release:
https://github.com/cloudera/hue/commit/2b44f592c9e564716b87547bea9a15454be1b7c9

Uploading even only a 30MB file with replication 1 should fail (but should pass when done on the command line) in your setup.

About the "pseudo distributed cluster" means that all the Hadoop processes are on the same machine: https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+Node+in+Pseudo-distributed+Mode, so yes you are in "pseudo distributed mode".

Romain
Reply all
Reply to author
Forward
0 new messages