Hi community ,
We are trying to create a unified environment for BI in the company in order to make much easier to create / test ETLs for the developers. Basically the idea is to have a docker environment (docker-compose) with a reduced replica of what we have in production so, we can assure that everything that works in this docker should run in production.
Inside docker we have setup pdi 6.1 using carte, and some db images. At first everything looked good, being able to run simple ETLs on the containers, having access to the databases, etc. but when we tried a real one everything crashed.
The first problem we found was that carte wasn't able to load the config for the ETL which was referenced as a file, even using the option "send resources to this server" in spoon. After some researching we saw that this option packs referenced jobs and transforms but not other files (sql, config files, etc) so that wasn't an option.
The next thing we tried was create a file repository in spoon, so we can share a "common storage" between our machine and the container. Everything looked good at first, but as we already had a directory hierarchy to classify our ETLs, i don't know why spoon automagically duplicated the job / transformation into the repository's root directory even if the job was already inside the repo. As we wanted to keep our directory structure, we moved to the next try.
After these trials, the next idea we had was to create an external volume in the docker container with all our sources, and try to find a way to map the path in our local machine to the one used inside the container.
At first we tried to mount it in the same folder having a mapping 1 to 1. (both in /home/XXXX/ETLs, for example) but we found this was still crashing because carte was trying to find dependant files in /. We didn't know why that was happening, but after some more research we found we were using internal.job.filename.directory everywhere and this is local to each spoon / carte environment and we didn't see another built-in option to share this.
Finally the solution we found and seems to be working fine, was to create a variable inside kettle.properties which contains the path were all our ETLs are. inside the docker is going to be pointing to our shared volume, and in the user's computer will be pointing to his ETL path. After that we had to modify all the ETLs to use our new ETL_PATH variable.
I suppose there is an easier / less intrusive way to make this work, as modifying all the jobs doesn't seems to be the way to go, but we didn't find it. We would appreciate so much if you can point us in the right direction.
Thanks a lot,