kite, flume and non-avro sources

25 views
Skip to first unread message

Tim Williams

unread,
May 22, 2015, 8:38:16 AM5/22/15
to cdk...@cloudera.org
Thanks to Ryan I can now go from a flume avro source to a kite [hive]
datasink. I'm wondering if there's an idiomatic kite way to go from
other flume sources? For example, an HTTPSource using the
JSONHandler, are folks typically using morphlines to get that into an
avro form for the sink? or is there a best approach? or is the kite
framework un-opinionated about it?

Thanks,
--tim

Tim Williams

unread,
May 22, 2015, 10:01:18 AM5/22/15
to cdk...@cloudera.org
Well now, that's embarrassing... I totally missed this example[1]
which precisely answers my question. One thing to consider is that it
might be nice if the morphlineFile could be in dfs?

Thanks,
--tim

[1] - https://github.com/kite-sdk/kite-examples/tree/master/json

Tim Williams

unread,
May 22, 2015, 11:38:41 AM5/22/15
to cdk...@cloudera.org
On Fri, May 22, 2015 at 2:01 PM, Tim Williams <willi...@gmail.com> wrote:
> On Fri, May 22, 2015 at 12:38 PM, Tim Williams <willi...@gmail.com> wrote:
>> Thanks to Ryan I can now go from a flume avro source to a kite [hive]
>> datasink. I'm wondering if there's an idiomatic kite way to go from
>> other flume sources? For example, an HTTPSource using the
>> JSONHandler, are folks typically using morphlines to get that into an
>> avro form for the sink? or is there a best approach? or is the kite
>> framework un-opinionated about it?
>
> Well now, that's embarrassing... I totally missed this example[1]
> which precisely answers my question. One thing to consider is that it
> might be nice if the morphlineFile could be in dfs?

Come to think of it, the same goes for the schemaFile in the morphline itself...

Thanks,
--tim

Ryan Blue

unread,
May 22, 2015, 12:17:19 PM5/22/15
to Tim Williams, cdk...@cloudera.org, Jarek Jarcec Cecho
I'm glad it's working, Tim. Thanks for letting us know.

Another thing we've been working on is a way to go directly from CSV or
JSON to Avro in the dataset sink. You wouldn't have the same flexibility
you do with morphlines, but it would be a bit simpler to get running if
you just need to convert the format using a well-defined Avro schema.

The Flume issue is FLUME-2646 [1]. I just bumped the issue and we'll
work on getting it in the next point release.

rb

[1]: https://issues.apache.org/jira/browse/FLUME-2646
--
Ryan Blue
Software Engineer
Cloudera, Inc.
Reply all
Reply to author
Forward
0 new messages