Last week it was announced that
Pentaho became officially an Hitachi Data Systems company. And a while back, when the news around the intent for acquisition came out
I blogged about it and guaranteed that we wouldn't be abandoning our Open Source heritage.
Well, 5 days after the finalization of the deal...
Pentaho 5.4 (EE and CE) is available
You can get it from the usual places:
And what's new here? Lemme do a very quick overview
New PDI look and feel (CE/EE)
Our beloved PDI (I still call it Kettle) now has an amazing face to
match it's amazing capabilities. It behaves great - now it also looks
great. And it's much more than icons, we added SVG support to it. Just
try it!
Analyzer API Extensibility (EE)
Extension points give developers greater control over layout, styling,
and interactivity when embedding Pentaho Analyzer. This is the result of
a work that started on 5.3 and was just finished
DI REST API Documentation (EE)
Well,
all our documentation is open
and available, but the Data Integration Server is an EE component.
Documentation is always fundamental, and we're paying attention to this
more and more on every release
BA Server Utilities (CE/EE)
Plugin that enables PDI to communicate with the Pentaho Platform, gets
the list of endpoints made available by it, and in a very simple way
execute actions on the platform.
When published in the server, PDI transformations that use this step can
do a local execution, bypassing the specified authentication
and running with the permissions attached to the current user
Ctools support for Require.js in CDF/CDE (CE/EE)
This is so huge I'll dedicate a new blog post to this later. We did a
major refactor on the Ctools, restructuring CDF/CDE as an AMD-based
framework by using Require.js.
Named Hadoop Clusters (CE/EE)
Every Hadoop step has UI to specify the NameNode and port, JobTracker
and port etc. We simplified the UI and now we have a single dialog where
users can enter all the required information about the cluster. That
info is saved as a named configuration and be selectable within each
Hadoop step that needs it.
AWS EMR Shim Support (CE/EE)
We now have full support for a shim that runs on amazon EMR, including S3 support! ETL at scale on this highly used platform!
SAP HANA for PDI (EE)
SAP HANA is an in-memory columnar database that provides support for
both high speed transactions and complex analytical queries. It is the
foundation for all next generation SAP Analytics & ERP applications.
Spark Integration (CE/EE)
Spark is an in-memory processing engine that can be clustered/scaled
using Hadoop. It’s original use case was for data scientists to issue
statements that would load sets of data into memory, and then issue a
series of queries against that data to investigate it. Spark can run on
top of Hadoop using HDFS as its distributed file system. It is widely
believed that Spark will replace MapReduce as the Hadoop distributed
computing engine.
To improve the orchestration ability, we created a PDI Job Step to
execute compiled java code that Spark developers create. Through labs
we're working on a full shim support, but this is a great start!
SDR / Modeling improvements (EE)
We're adding more and more capabilities into what - in my opinion - will
be the proper approach to modeling in our stack: start bringing
business information at the source of the data!
In 5.4 we have:
- Metastore – enable the reuse of metadata for sharing between transformations.
- Support for Shared Dimensions in auto modeler – Allows more complex
schemas and the ability to reuse dimensional tables like time. Allows
reuse across multiple SDR implementations and even help build dimension
tabled for traditional DWHs.
- Star Schema support in auto modeler – Expand support from single table (today) to star schemas
Localization for Pentaho User Console (CE/EE)
Professionally translated, tested and supported language packs developed
by a contracted localization company for French, German and Japanese.
It's a great extension to the work done by the community (and that still
exists) around the
community language packs
Mongo 3.0 (CE/EE)
Mongo launched a major version and some of the old apis were deprecated. Updated!
Google Analytics step in PDI (CE/EE)
Google changed the authentication to Google Analytics, and the old step
broke. Fixed (even though it requires some work. A blog post on this
later)
---------------------------------------
An amazing release! The best so far! Better than this one, only the next one ;)
Cheers!
-pedro