Hello, Matt.
Thanks for your response.
I agreed, that Kylo fits very good for data lake implementation, and emphaize previosuly that data viz and analytics on prepared datasets are not features, that have to be implemented directly in Kylo. But for completeness of vision, it's worth to mention somewhere (in docs or in tutorial videos?) how to do queyring, reports and visualization on prepread (with Kylo) datasets via third party tools. In short, how to consume and use data prepared with Kylo without singinifcant IT involving. Without such description or guide data lake concept is not complete.
Zeppelin may require some training for business users to work with it, but it seems good solution.
Business users feel comfortable while using BI tools. Although many of them support Hadoop integration, direct connection to Hadoop data source may be complex for users and some use cases cannot be done (i.e. working with dataset, that cannot fit in the user PC memory).
In my opinion, good option is to use some SQL-on-Hadoop tool, which can provide API to BI tools, therefore incapsulating Hadoop at least at some level. Now we experiment with Apache Drill, it supports ANSI SQL, provides ODBC/JDBC connection for third parties, and allows to do interactive analytics on Hadoop data using distributed query execution. Also, ability to use ANSI SQL instead of HiveQL seems very good (according to self-service).
One interesting question arises here, how to setup Kylo, Drill (or another SQL-on-Hadoop tool which provides API) and BI tool to get seamless analytics on prepared with Kylo data via BI tool by user. Ideally, user no need to know something about Drill and especially how to configure it. IT involvement should also be minimized. We will work and experiment in this direction, will share interesting findings as soon as we have them.