Pending feature list

1 view
Skip to first unread message

mridul

unread,
Jan 15, 2010, 11:54:53 AM1/15/10
to Piglet DSL
Hi,

Is there a pending feature list, or wishlist someplace ?
A few things from top my head which have always bugged me, and a few
collegues who have used pig, and which I think might be relatively
'nicely' handled by piglet might be :


mapred related.
a) Need to identify and add PARALLEL to the mapred boundaries -
automatically inserting it would be cool.
b) What value to specify for PARALLEL - even something simpled infered
from input sizes might be useful (with ability to override by user).

autogeneration of udf's.
c) There are a lot of boilerplate code which needs to be written in
pig - either due to impl constraints which currently exist, potential
(/existing) design issues, and fragility in pig parser. Allowing for
(even if stop-gap) ways to extend and improve this aspect would be
wonderful.
Examples include: nested filter, more general nesting in foreach,
generation/insertion of tuples/bag's (either static, or from hdfs).
d) better udf management - including wrapper's to allow wrappers in
ruby to be executed in pig (jruby based ?)

inferring script semantics and optimizing.
d) The low hanging fruits here might be join type (regular, frj, skew,
etc), but I am sure others might think of something better too.

better workflow management.
e) Currently pig schedules all jobs corresponding to a script together
- when multi-query optimization is enabled. But at times, a suffix
plan would be depending on a prefix of the script - like generate side
file in first 10 lines, and use side file in next 20.
This might be more 'clearly' representable in piglet - and so we can
better schedule the script/jobs.

f) similar thing applies to 'cleanup' and job setup. (When to remove
intermediate files, when to setup resources for consumption and then
free it after use. )


I am not sure how many of this needs to be in piglet ... and how many
should get into pig. Considering the complexity of pig already, I am
assuming this might be a better fit - please do note that I am not
good at ruby, so possibly might be making wrong design/impl
assumptions above !


Thanks,
Mridul

Theo Hultberg

unread,
Jan 15, 2010, 1:38:49 PM1/15/10
to pigle...@googlegroups.com
hi,

there is no pending feature list, but you can use the issue tracker on
the github site to add suggestion:

http://github.com/iconara/piglet/issues

I plan to add support for %default and %declare, and some more udf
support. I'll also try to get foreach with inner bags (or is it outer?
I forget) working.

T#

Reply all
Reply to author
Forward
0 new messages