hue web show data loss

180 views
Skip to first unread message

木子岂

unread,
Aug 11, 2015, 6:33:41 AM8/11/15
to hue-user
  Hello,
     when I run the command of "select * from bbdw.rpt_overseas_category_sales_pay_v3", just show 55 records, but In fact, the table of rpt_overseas_category_sales_pay_v3 has 97 records;
I have a doult that if the null value of the rpt_overseas_category_sales_pay_v3 will lead to data loss!!!
                                                                                                                                                          thanks 

Romain Rigaux

unread,
Aug 11, 2015, 12:13:01 PM8/11/15
to 木子岂, hue-user
What does this show?

select count(*) from bbdw.rpt_overseas_category_sales_pay_v3

To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+u...@cloudera.org.

木子岂

unread,
Aug 11, 2015, 11:13:18 PM8/11/15
to Romain Rigaux, hue-user, hue-user+unsubscribe
Hello,
     when I execute "select count(*) from rpt_overseas_category_sales_pay_v3", the show is 97 records, this is the right; Just when I select * from rpt_overseas_category_sales_pay_v3 in the Hue Web, it show 55 records; But I execute the "select * from rpt_overseas_category_sales_pay_v3" in the hiveserver2, the result is right...
 
----------------- 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月12日(星期三) 凌晨0:12
收件人: "木子岂"<3520...@qq.com>;
抄送: "hue-user"<hue-...@cloudera.org>;
主题: Re: hue web show data loss

Tatsuo Kawasaki

unread,
Aug 11, 2015, 11:21:43 PM8/11/15
to 木子岂, Romain Rigaux, hue-user
Hi,

Which version of Hue (or CDH?) are you using? 3.8?
I faced similar (but different) issue before. It was caused by
https://issues.cloudera.org/browse/HUE-2472

Thanks,
--
Tatsuo
--
--
Tatsuo Kawasaki
tat...@cloudera.com

木子岂

unread,
Aug 11, 2015, 11:23:55 PM8/11/15
to Tatsuo Kawasaki, Romain Rigaux, hue-user
Hue 3.7 of CDH5.4.0


------------------ 原始邮件 ------------------
发件人: "Tatsuo Kawasaki";<tat...@cloudera.com>;
发送时间: 2015年8月12日(星期三) 中午11:21
收件人: "木子岂"<3520...@qq.com>;
抄送: "Romain Rigaux"<rom...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;

Romain Rigaux

unread,
Aug 11, 2015, 11:27:14 PM8/11/15
to 木子岂, Tatsuo Kawasaki, hue-user
Ha if you have some null values, the results were truncated sometime. this was fixed in C5.4.1+, could you upgrade toCDH5.4.4?

木子岂

unread,
Aug 11, 2015, 11:31:54 PM8/11/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
If I just upgrade the Hue from 3.7.0 to 3.8.0, but I don't upgrade CDH from 5.4.0 to 5.4.1+, just ok?
In addition, the right result will show in the Impala, but got the error result in the hue 3.7.0, so I has a doult about the different result...... 

------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月12日(星期三) 中午11:26
收件人: "木子岂"<3520...@qq.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;

木子岂

unread,
Aug 12, 2015, 1:03:00 AM8/12/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Hi,  If I just add the patch to the Hue3.7.0, just ok? Yeah, If I suddenly upgrade CDH5.4.0 to 5.4.1+, the total business will affect...

------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<3520...@qq.com>;
发送时间: 2015年8月12日(星期三) 中午11:31
收件人: "Romain Rigaux"<rom...@cloudera.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;
主题: 回复: hue web show data loss

Romain Rigaux

unread,
Aug 12, 2015, 1:15:39 AM8/12/15
to 木子岂, Tatsuo Kawasaki, hue-user
Just switching the Thrift Version to 5 in hue.ini will fix it:

https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini#L863

If you are using CM, enter in the Hue safety valve:
[beeswax]
thrift_version=5

http://gethue.com/how-to-configure-hue-in-your-hadoop-cluster/

木子岂

unread,
Aug 12, 2015, 3:29:54 AM8/12/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
I don't know why the hiveserver2 will exit frequently, which leads related hive task not be submitted due to not connecting to Hiveserver2, but I want to prevent it by adding another hiveserver2, but I have no idea about the config of Hue.ini,
because it just define a interface, just as follows:
 
[beeswax]
 
  # Host where HiveServer2 is running.
  hive_server_host=hiveserver.ent.com
 
I want to know whether I can add another "hive_server_host=hiveserver.entxxxxxxxx.com" in the beeswax, thank you!!!


------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月12日(星期三) 中午1:15

木子岂

unread,
Aug 12, 2015, 3:52:24 AM8/12/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Hello,
when I update hue.ini config as follows in the Hue safety value:
[beeswax]
thrift_version=5
I find I can't connect the hiveserver2 by Hue web, why?


------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<3520...@qq.com>;
发送时间: 2015年8月12日(星期三) 下午3:29

Romain Rigaux

unread,
Aug 12, 2015, 3:41:49 PM8/12/15
to 木子岂, Tatsuo Kawasaki, hue-user
Hue does not support multi HiveServer2 yet.

What is the error with the Thrift version?

木子岂

unread,
Aug 13, 2015, 2:51:27 AM8/13/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Today I find that related partition data of the table A in the Impala(show partitions table_A) is not consistent with the Hive (show partitions table_A). After all, the Impala share the hive metastore data with the Hive, why?
 
What's more, I find some frequent errors in the hive metastore log, just as follows:
 
2015-08-13 12:51:25,134 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: NoSuchObjectException(message:There is no database named cloudera_manager_metastore_canary_test_db_hive_hivemetastore_4e36124f3c0fe4b2c4997ba3fbde6a4a)
        at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:549)
        at org.apache.hadoop.hive.metastore.ObjectStore.getJDODatabase(ObjectStore.java:593)
        at org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:583)
        at org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:575)
        at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2389)
        at org.apache.hadoop.hive.metastore.ObjectStore.getDatabaseInternal(ObjectStore.java:575)
        at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:559)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 
I don't know if the exception is just the exit reason of HiveServer2 frequently? 
 
Thanks!!!

------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月13日(星期四) 凌晨3:41

Romain Rigaux

unread,
Aug 19, 2015, 2:38:39 PM8/19/15
to 木子岂, Tatsuo Kawasaki, hue-user
Did you do an 'invalidate metadata' or click on the litthe refresh icon on top of the Assis in the Impala editor?

If the issue persists I would recommend to ask this on the Impala list: groups.google.com/a/cloudera.org/group/impala-user

About the canary error, this is used by CM and should be ignorable (the error should show up in CM as a warning).

木子岂

unread,
Aug 24, 2015, 10:47:51 PM8/24/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Hi,
After I fix it by switching the Thrift Version to 5, related impala has some errors in the Hue Web , just as follows:
Impala Editor 没有可向其发送请求的 Impalad。
I has a doult whether the related Impala Version will change or other? thank you!!!

------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月12日(星期三) 中午1:15

Romain Rigaux

unread,
Aug 24, 2015, 11:36:30 PM8/24/15
to 木子岂, Tatsuo Kawasaki, hue-user
Are you sure it is related to the change, Impala should also support Thrift v5. What error do you see on the /logs page of Hue after opening up the Impala app?

木子岂

unread,
Aug 24, 2015, 11:48:13 PM8/24/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Impala can't load hive metastore data, so it's in the loading status when I open the impala app in the hue...


------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月25日(星期二) 中午11:36

木子岂

unread,
Aug 25, 2015, 12:13:42 AM8/25/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Ha, I has known about the reason that Impala can't load hive metastore data...
 
I want to know if you has studyed the related scheduler policy of Hadoop, Fair Scheduler or Capacity.
I encountered a problem that when a big mapreduce task starts to execute in the hadoop cluster, it will get all available resource gradually, which leads that other small mapreduce task can't execute, so I want to change a scheduler policy, Any help will appreciate it. thank you!!!


------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<3520...@qq.com>;
发送时间: 2015年8月25日(星期二) 中午11:47

Romain Rigaux

unread,
Aug 25, 2015, 12:24:05 AM8/25/15
to 木子岂, Tatsuo Kawasaki, hue-user

木子岂

unread,
Aug 25, 2015, 12:58:37 AM8/25/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
right, Impala can't get latest hive metastore data in the Hue web when I add a partition to the partitioned table, but Hive can get quickly, why?  


------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年8月25日(星期二) 中午12:23

木子岂

unread,
Sep 22, 2015, 2:21:14 AM9/22/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Hello,
    I have succeed to set the Hive Authority in the Hive Client based on the official website of Apache Hive, and added a super admin, but this change is not reflected to Hue Web(I have restarted Hue Server) in the CDH Manager,Why?
    I have scanned the issue about this problem(https://issues.cloudera.org/browse/HUE-1373), this bug has fixed in Hue 3.5.0. My Hue version is 3.7.0, so any help will appreciate it, thank you.

Romain Rigaux

unread,
Sep 22, 2015, 10:34:04 AM9/22/15
to 木子岂, Tatsuo Kawasaki, hue-user
What do you mean by super admin and what are you expecting?

If you login as the hive user in Hue and Hive impersonation is ON, it will be the same as being the hive user on the command line / beeline

Some info about impersonation: http://gethue.com/hadoop-tutorial-hive-query-editor-with-hiveserver2-and/

木子岂

unread,
Sep 23, 2015, 7:45:29 AM9/23/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
My CDH Manager's version is 5.4.0

Precondition:
hive-site.xml config is as follows:
<property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.server2.use.SSL</name>
    <value>false</value>

<property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.security.authorization.createtable.owner.grants</name>
    <value>ALL</value>
  </property>
  <property>
    <name>hive.security.authorization.task.factory</name>
    <value>org.apache.hadoop.hive.ql.parse.authorization.HiveAuthorizationTaskFactoryImpl</value>
  </property>
  <property>
    <name>hive.semantic.analyzer.hook</name>
    <value>com.bigdata.hive.AuthorityControlHook</value>  -> just I define for the super admin
  </property>

Now
When I execute Hive Client, just as follows:
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/hive-common-1.1.0-cdh5.4.0.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive (default)> show current roles;
OK
public
Time taken: 2.119 seconds, Fetched: 1 row(s)
hive (default)> create role tmp;
beibei:640
FAILED: SemanticException hdfs can't use ADMIN options, except admin.
hive (default)> 
this is expected to me. 

But

when I execute the Beeline command, just as follows:
beeline> !connect jdbc:hive2://localhost:10000 org.apache.hive.jdbc.HiveDriver
scan complete in 5ms
Connecting to jdbc:hive2://localhost:10000
Enter password for jdbc:hive2://localhost:10000: 
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show current roles;
Error: Error while compiling statement: FAILED: SemanticException The current builtin authorization in Hive is incomplete and disabled. (state=42000,code=40000)
0: jdbc:hive2://localhost:10000> 

Why?

Then to Google, I find a post, just as follows:
If you have already configured the Sentry Service, make sure that it is associate to the Hive role that you are trying to use.
Look in Hive-> Configuration -> Service-Wide -> Sentry Service.

So I doubt whether my Sentry Service is not installed in the CDH Manager,  just as follows in the CDH Manager:


Any help will appreciate it, thank you!!!


------------------ 原始邮件 ------------------
发件人: "Romain Rigaux";<rom...@cloudera.com>;
发送时间: 2015年9月22日(星期二) 晚上10:33
收件人: "木子岂"<3520...@qq.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;
主题: Re: HUE ignores Hive authorization layer
2914FF0D@DB4FEE4E.CE900256

spaz...@gmail.com

unread,
Sep 23, 2015, 6:02:48 PM9/23/15
to Hue-Users, rom...@cloudera.com, tat...@cloudera.com, 3520...@qq.com
Are you using CM's "HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml" to add the configuration overrides to hive-site.xml?

If so, can you try instead adding those configurations to: Hive > Configuration > Gateway > Advanced > Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml

Then save and restart both Hive and Hue.  This should allow Hue to pickup the hive-site.xml changes.

木子岂

unread,
Sep 25, 2015, 3:39:50 AM9/25/15
to Romain Rigaux, Tatsuo Kawasaki, hue-user
Hello, Romain,
   Recently I find when hue can succeed to execute related hive sql, but beeline can't execute to the same hive sql due to some related character, just as follows:

Beeline:
0: jdbc:hive2://localhost:10000> select locate("Romain",";Romain Rigaux<Rom...@cloudera.com>",3);
Error: Error while compiling statement: FAILED: ParseException line 1:24 cannot recognize input near '<EOF>' '<EOF>' '<EOF>' in select expression (state=42000,code=40000)

But

Hue:
When I execute this hive sql, I can succeed to execute
select locate("Romain",";Romain Rigaux<Rom...@cloudera.com>",3);   

In fact, Hue is just the same hive task summit client as Beeline, but beeline appears differences. Any help will appreciate it. Thank you!!!
   

 

Romain Rigaux

unread,
Sep 25, 2015, 2:12:26 PM9/25/15
to 木子岂, Tatsuo Kawasaki, hue-user
You will need to escape the ; with \ I believe in beeline.

In Hue it works because we ignore the ; between quotes while parsing and sending it to HiveServer2.

You can escape in both cases

To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+u...@cloudera.org.

Romain Rigaux

unread,
Feb 16, 2016, 12:13:18 AM2/16/16
to 木子岂, Tatsuo Kawasaki, hue-user
It "kind of works", but Spark SQL is pretty behind Hive and Impala: https://issues.cloudera.org/browse/HUE-2985

On Sat, Feb 13, 2016 at 7:58 PM, 木子岂 <3520...@qq.com> wrote:
Hello, Romain,
    My CDH Manager Version is 5.4.0, and I installed the Hue Server. Recenty I  start to replace Hive SQL to put into production with Spark SQL, I want to know if Hue support spark sql. Wishs to give me some advices. 


------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<3520...@qq.com>;
发送时间: 2015年9月25日(星期五) 下午3:39
收件人: "Romain Rigaux"<rom...@cloudera.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;
主题: Hue and Beeline difference

Romain Rigaux

unread,
May 6, 2016, 12:44:51 PM5/6/16
to 木子岂, Tatsuo Kawasaki, hue-user
Which CDH version?

Do you have thousands of database or tables?



On Thu, May 5, 2016 at 11:36 PM, 木子岂 <3520...@qq.com> wrote:
Hello, Romain,
      When I execute related sql in Hue By HiveServer2, suddenly, Hue 504 gateway(recently the phenomenon), I scan Hue HiveServer2, HiveMetaStore related log, exception log is as followers:

2016-05-06 12:04:15,769 ERROR hive.log: Converting exception to MetaException
2016-05-06 12:04:15,770 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:service (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)
................................................................................
Caused by: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out)

Then I reset up related HiveServer2 server, Hue executes normally. Why? 







------------------ 原始邮件 ------------------
发件人: "木子岂";<3520...@qq.com>;
发送时间: 2016年2月14日(星期天) 中午11:58
收件人: "Romain Rigaux"<rom...@cloudera.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;
主题: Hue support spark sql?

Hello, Romain,
    My CDH Manager Version is 5.4.0, and I installed the Hue Server. Recenty I  start to replace Hive SQL to put into production with Spark SQL, I want to know if Hue support spark sql. Wishs to give me some advices.


------------------ 原始邮件 ------------------
发件人: "我自己的邮箱";<3520...@qq.com>;
发送时间: 2015年9月25日(星期五) 下午3:39
收件人: "Romain Rigaux"<rom...@cloudera.com>;
抄送: "Tatsuo Kawasaki"<tat...@cloudera.com>; "hue-user"<hue-...@cloudera.org>;
主题: Hue and Beeline difference
Reply all
Reply to author
Forward
0 new messages