Cannot show table with pyspark

51 views
Skip to first unread message

Ahmed Gater

unread,
Oct 10, 2016, 3:56:35 PM10/10/16
to Hue-Users
Hi Everyone, 
I'm new to Hue and I'm using pySpark interpreter to do some analysis on a file. I want to show the content of an rdd as a table using %table (I see this in a demon showing Hue with pyspark). When I try to do the same thing I get the following error:
  • File "<stdin>", line 1 from __future__ import print_functionimport timeimport datetimeprocess_logs = sc.textFile('/user/cloudera/purchase_process/PurchaseProcess.csv')def toTS(x): return time.mktime(datetime.datetime.strptime(x, "%Y/%m/%d %H:%M:%S.%f").timetuple())def build_cases_summary(case_id_events_mapping): sorted_events=sorted(list(case_id_events_mapping[1]), key=lambda case_logs: toTS(case_logs[3])) return (case_id_events_mapping[0],sorted_events[0][3],sorted_events[-1][4],len(sorted_events),(toTS(sorted_events[-1][4])-toTS(sorted_events[0][3]))/(60*60)) process_logs = process_logs.map(lambda line: line.split(";")).groupBy(lambda row: row[0])cases_raw_summary = process_logs.map(build_cases_summary) cases_summary = cases_raw_summary.collect() nb_events = reduce(lambda x,y: x+y,map(lambda entry : entry[3],cases_summary)) ^ SyntaxError: invalid syntax

The error I get occurs when I add %table cases_summary at the end of my script.


My script is :
#################################################"
from __future__ import print_function
import time
import datetime
process_logs = sc.textFile('/user/cloudera/purchase_process/PurchaseProcess.csv')

def toTS(x):
    return time.mktime(datetime.datetime.strptime(x, "%Y/%m/%d %H:%M:%S.%f").timetuple())

def build_cases_summary(case_id_events_mapping):
    sorted_events=sorted(list(case_id_events_mapping[1]), key=lambda case_logs: toTS(case_logs[3]))
    return (case_id_events_mapping[0],sorted_events[0][3],sorted_events[-1][4],len(sorted_events),(toTS(sorted_events[-1][4])-toTS(sorted_events[0][3]))/(60*60))
    
process_logs = process_logs.map(lambda line: line.split(";")).groupBy(lambda row: row[0])
cases_raw_summary = process_logs.map(build_cases_summary) 
cases_summary = cases_raw_summary.collect() 

nb_events = reduce(lambda x,y: x+y,map(lambda entry : entry[3],cases_summary))


%table cases_summary
#########################################################"

I appreciate your help

Miguel Moraleja

unread,
Oct 11, 2016, 4:19:35 AM10/11/16
to Hue-Users
Hi

How do you create a pyspark notebook in Hue? I am running Hue 3.10 and pyspark is not available in Notebooks > Create snippets, only Hive, Pig, Text, Markdown, MySQL, SQLite, PostgreSQL and Oracle.

Thanks

Romain Rigaux

unread,
Oct 21, 2016, 2:25:14 AM10/21/16
to Miguel Moraleja, Hue-Users

--
You received this message because you are subscribed to the Google Groups "Hue-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+unsubscribe@cloudera.org.

Reply all
Reply to author
Forward
0 new messages