Blog: Using Hue to interact with Apache Kylin in your cluster or on AWS

13 views
Skip to first unread message

Romain Rigaux

unread,
Dec 13, 2017, 9:30:32 AM12/13/17
to Hue-Users
Initially published on http://gethue.com/using-hue-to-interact-with-apache-kylin/


This is a blog post from the community by Joanna He and Yongjie Zhao.

What is Apache Kylin

Apache Kylin is a leading open-source online analytical processing (OLAP) engine that’s built for interactive analytics for Big Data. It provides an ANSI-SQL interface and multi-dimensional OLAP for massive datasets. It supports consuming data in batch and streaming and offers sub-second query latency on petabyte-scale dataset. It seamlessly integrates with BI tools via ODBC driver, JDBC driver, and REST API.

What is Hue

Hue is a very easy-to-use SQL editor that allows you to query Hadoop-based service using a user-friendly web-based interface. Hue makes access big data on Hadoop easier for Analysts as SQL is the most familiar language analysts could use.

In this post, we will demonstrate how you can connect Hue to Apache Kylin and get quick insight from huge volume data in seconds.

Build Docker Image of Hue with Apache Kylin

Prepare hue image

Use docker to pull the latest hue.

1
docker pull gethue/hue:latest

Prepare kylin jdbc driver

Download Apache Kylin installer package

1
wget -c http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.2.0/apache-kylin-2.2.0-bin-hbase1x.tar.gz

Unzip package

1
tar -zxvf apache-kylin-2.2.0-bin-hbase1x.tar.gz

cp Kylin jdbc driver

1
2
3
cp apache-kylin-2.2.0-bin/lib/kylin-jdbc-2.2.0.jar .
hue$ ls
apache-kylin-2.2.0-bin  apache-kylin-2.2.0-bin-hbase1x.tar.gz  kylin-jdbc-2.2.0.jar

Copy hub config file to host machine

Copy the file from docker

1
2
3
docker run -it -d --name hue_tmp gethue/hue /bin/bash
cp hue_tmp:/hue/desktop/conf/pseudo-distributed.ini .
docker stop hue_tmp; docker rm hue_tmp

Now you should have the pseudo-distributed.ini in your current directory.

Configure pseudo-distributed.ini with Kylin connection

1
vim pseudo-distributed.ini

copy below kylin section in the file

1
2
3
4
5
6
dbproxy_extra_classpath=/hue/kylin-jdbc-2.2.0.jar
 
[[[kylin]]]
      name=kylin JDBC
      interface=jdbc
      options='{"url": "jdbc:kylin://<your_host>:<port>/<project_name>","driver": "org.apache.kylin.jdbc.Driver", "user": "<username>", "password": "<password>"}'

For example, add below configuration section in the file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
dbproxy_extra_classpath=/hue/kylin-jdbc-2.2.0.jar
 
  # One entry for each type of snippet.
  [[interpreters]]
    # Define the name and how to connect and execute the language.
    [[[kylin]]]
      name=kylin JDBC
      interface=jdbc
      options='{"url": "jdbc:kylin://localhost:7070/learn_kylin","driver": "org.apache.kylin.jdbc.Driver", "user": "ADMIN", "password": "KYLIN"}'
 
 
    [[[hive]]]
      # The name of the snippet.
      name=Hive
      # The backend connection to use to communicate with the server.
      interface=hiveserver2

Edit Dockerfile

1
2
touch Dockerfile
vim Dockerfile

paste below script in Dockerfile

1
2
3
4
5
6
FROM gethue/hue:latest
 
COPY ./kylin-jdbc-2.2.0.jar /hue/kylin-jdbc-2.2.0.jar
COPY ./pseudo-distributed.ini /hue/desktop/conf/pseudo-distributed.ini
 
EXPOSE 8888

This configuration will copy the kylin jdbc jar and pseudo-distributed.ini into the hue in Docker. And expose port 8888 in Docker.

Build and start docker container

1
2
docker build -t hue-demo -f Dockerfile .
docker run -itd -p 8888:8888 --name hue hue-demo

Hue is now up and running in your localhost:8888

 

You can now query kylin from Hue.

 

 

Deploy Hue with Apache Kylin on AWS

Below content will guide you how to deploy Hue with Apache Kylin on AWS EMR.

Install Apache Kylin on AWS EMR

You may refer to this document to install Apache Kylin on AWS EMR.

Install Hue with Apache Kylin configured on AWS EMR

After you installed Apache Kylin on AWS EMR, you can now deploy Hue on AWS EMR with Kylin configured easily using our bootstrap file.

  1. Download the download.sh file from this github to a S3 bucket;
  1. In configurations.json, replace Apache Kylin host, port, project, credential with you own, then run below script in AWS CLI to create a EMR cluster.Make sure you escape options setting as shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[
  {
    "Classification": "hue-ini",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "notebook",
        "Properties": {
          "dbproxy_extra_classpath": "/opt/kylin_jdbc/kylin-jdbc-2.2.0.jar"
        },
        "Configurations": [
          {
            "Classification": "interpreters",
            "Properties": {},
            "Configurations": [
              {
                "Classification": "kylin",
                "Properties": {
                  "name": "kylin JDBC",
                  "interface": "jdbc",
                  "options": "{\"url\": \"jdbc:kylin://<host>:<port>/<project>\", \"driver\": \"org.apache.kylin.jdbc.Driver\", \"user\": \"<username>\", \"password\": \"<password>\"}"
                },
                "Configurations": []
              }
            ]
          }
        ]
      }
    ]
  }
]
1
2
3
4
5
6
7
8
aws emr create-cluster --name "HUE Cluster" --release-label emr-5.10.0 \
--ec2-attributes KeyName=<keypair_name>,InstanceProfile=EMR_EC2_DefaultRole,SubnetId=<subnet_id> \
--service-role EMR_DefaultRole \
--applications Name=Hive Name=Hue Name=Pig \
--emrfs Consistent=true,RetryCount=5,RetryPeriod=30 \
--instance-count 1 --instance-type m3.xlarge \
--configurations file://configurations.json \
--bootstrap-action Path="s3://<your_bucket>/download.sh"
  1. After the cluster is in “Waiting” status, open web browser at: http://:8888 , you will see the cluster with hue is ready.

 

Conclusion

We have demonstrate how you can easily configure Hue to query Apache Kylin. Hue is a great open source SQL editor for your interative analytics on Kylin. Both Hue and Apache Kylin can be deployed either on premises or in the cloud so you can utilize this combination anywhere.

The whole document and related files are on this repo for your reference:

https://github.com/Kyligence/emr-hue-kylin



Reply all
Reply to author
Forward
0 new messages