ETL using FBP
In a recent conversation with Felipe Valdés, an experienced programmer who has been an FBP enthusiast for a number of years (he is quoted in my book), we talked about future directions for FBP, and Felipe suggested, "write a couple basic ETL components and become a ETL DSL, market as ETL, beat the competitors with better usability for the editor." !
Now, I had actually forgotten about ETL, but in my book - more specifically the 2nd edition - there are quite a few references to papers about ETL - many of them in connection with checkpointing, as ETL apps are often quite long-running.
The transform (the T in ETL) component/subnet can either be prepackaged, using some kind of rules manager (I have some experience doing this in Java), or just be a hand-written FBP component/subnet... The front and back ends would be components interfacing with various database managers, e.g. JDBC, or straight files... or combinations of the above!
And of course these ETL-like networks can also interface to anything that "talks data", including file managers, sockets, messaging, etc., etc.
An FBP-based ETL system looks like an interesting project, so here goes:
I have already got a simple JavaFBP./JDBC demo working, so it's (in my view) just a matter of growing it organically... Most of the time was taken installing MySQL!
However... my JDBC program has a lot of stuff hard-wired, so turning it into a reusable module will take some design effort... Of course people could hand-code the Extract and Load components for every database in their application, using standard templates, but I would like to do better than that!
Another issue that jumped out at me was that the demo (which I took and modified from the Internet) uses double for price... which is a NO-NO! However, since we would be working with existing databases, we just have to know the currency - see below...
As follows (in Java):
ResultSet rset = stmt.executeQuery("... select statement...");
Book book = new Book();
book.title = rset.getString("title");
book.author = rset.getString("author");
book.price = rset.getDouble("price");
book.qty = rset.getInt("qty");
We did quite a bit of work on business data types when I was still at IBM, some of which is described in https://jpaulm.github.io/busdtyps.html
For any locale-independent system, you need to include the currency in a monetary value or price, so we used the internationally accepted 3-character currency codes, plus the value as a String, e.g. "CAD21.65" . Internally this was converted to currency code and the numeric value as BigDecimal (much more appropriate for currency amounts than "double"!)... Note: if you are not familiar with BigDecimal, it is described at https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
That's it for now - I think the Google group would be a good place to start a discussion - unless a GitHub project or other tool would be better...?
Who is up for some FBP-based design and implementation work?! I am going to put a JDBC program up on GitHub... which will be a good example of how NOT to build ETL components (unless there really is no better way)! See https://github.com/jpaulm/fbp-etl/tree/master/src/main/java/com/jpaulmorrison/dbtest
- you can see that a lot of things are hard-wired, so it will be quite a challenge to "componentize" it! MySQL databases are somewhat self-describing (although not completely), and Java has reflection, so we should be able to some componentization - we might also be able to generate E and L components from something like JSON...
Oh, I almost forgot: we have a quite decent FBP-oriented design tool, https://github.com/jpaulm/drawfbp
, which has been evolving over a number of years, so drawing and exchanging design diagrams will not be a problem...
PS I see where Nicholas Chen raised this idea back in 2011 (9 years ago!) - wonder if anything came of that effort?