I have a small doubt regarding joining two tables in spark whoch are in hive.

185 views
Skip to first unread message

daya

unread,
Oct 26, 2015, 5:46:44 PM10/26/15
to Hadoop Users Group (HUG) Chennai
HI Guys

Can anyone suggest me how to join two large tables(which are in hive) in spark-sql
If you have any code in scala,java can you please share it.

Actally I have written the code in eclipse but I was getting a error like this  : The type scala.reflect.api.TypeTags$TypeTag cannot be resolved. It is indirectly referenced from
 required .class files

The code is below : (The error is showing near package)

package joins;

    import org.apache.spark.SparkConf;
    import org.apache.spark.SparkContext;
    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.sql.DataFrame;
    import org.apache.spark.sql.hive.HiveContext;
   
    public class Spark
    {
       
    public static void main(String[] args)
    {
    SparkConf conf = new SparkConf();
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
    sqlContext.sql(" use myown ");
    DataFrame table_01 = sqlContext.sql("SELECT * FROM customer");
    table_01.saveAsTable("spark_table_01");
    sqlContext.cacheTable("spark_table_01");
    DataFrame table_02 = sqlContext.sql("SELECT * FROM account");
    table_02.saveAsTable("spark_table_02");
    sqlContext.cacheTable("spark_table_02");
    DataFrame table_join = sqlContext.sql(" SELECT a.* FROM  customer a join account b on a.ssn=b.ssn ");
    table_join.insertInto("customeraccount");
    sqlContext.uncacheTable("spark_table_01");
    sqlContext.uncacheTable("spark_table_02");
}
    }

Reply all
Reply to author
Forward
0 new messages