Cannot show table with pyspark

Ahmed Gater

unread,

Oct 10, 2016, 3:56:35 PM10/10/16

to Hue-Users

Hi Everyone,

I'm new to Hue and I'm using pySpark interpreter to do some analysis on a file. I want to show the content of an rdd as a table using %table (I see this in a demon showing Hue with pyspark). When I try to do the same thing I get the following error:

File "<stdin>", line 1 from __future__ import print_functionimport timeimport datetimeprocess_logs = sc.textFile('/user/cloudera/purchase_process/PurchaseProcess.csv')def toTS(x): return time.mktime(datetime.datetime.strptime(x, "%Y/%m/%d %H:%M:%S.%f").timetuple())def build_cases_summary(case_id_events_mapping): sorted_events=sorted(list(case_id_events_mapping[1]), key=lambda case_logs: toTS(case_logs[3])) return (case_id_events_mapping[0],sorted_events[0][3],sorted_events[-1][4],len(sorted_events),(toTS(sorted_events[-1][4])-toTS(sorted_events[0][3]))/(60*60)) process_logs = process_logs.map(lambda line: line.split(";")).groupBy(lambda row: row[0])cases_raw_summary = process_logs.map(build_cases_summary) cases_summary = cases_raw_summary.collect() nb_events = reduce(lambda x,y: x+y,map(lambda entry : entry[3],cases_summary)) ^ SyntaxError: invalid syntax

The error I get occurs when I add %table cases_summary at the end of my script.

My script is :

#################################################"

from __future__ import print_function

import time

import datetime

process_logs = sc.textFile('/user/cloudera/purchase_process/PurchaseProcess.csv')

def toTS(x):

return time.mktime(datetime.datetime.strptime(x, "%Y/%m/%d %H:%M:%S.%f").timetuple())

def build_cases_summary(case_id_events_mapping):

sorted_events=sorted(list(case_id_events_mapping[1]), key=lambda case_logs: toTS(case_logs[3]))

return (case_id_events_mapping[0],sorted_events[0][3],sorted_events[-1][4],len(sorted_events),(toTS(sorted_events[-1][4])-toTS(sorted_events[0][3]))/(60*60))

process_logs = process_logs.map(lambda line: line.split(";")).groupBy(lambda row: row[0])

cases_raw_summary = process_logs.map(build_cases_summary)

cases_summary = cases_raw_summary.collect()

nb_events = reduce(lambda x,y: x+y,map(lambda entry : entry[3],cases_summary))

%table cases_summary

#########################################################"

I appreciate your help

Miguel Moraleja

unread,

Oct 11, 2016, 4:19:35 AM10/11/16

to Hue-Users

Hi

How do you create a pyspark notebook in Hue? I am running Hue 3.10 and pyspark is not available in Notebooks > Create snippets, only Hive, Pig, Text, Markdown, MySQL, SQLite, PostgreSQL and Oracle.

Thanks

Romain Rigaux

unread,

Oct 21, 2016, 2:25:14 AM10/21/16

to Miguel Moraleja, Hue-Users

http://gethue.com/spark/

--
You received this message because you are subscribed to the Google Groups "Hue-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hue-user+unsubscribe@cloudera.org.

Reply all

Reply to author

Forward