Hi all,
I'm running into an issue that I think is launcher related. I'm using EMR 4.7.0 Presto just released on Jun 2nd. I created a 3 node cluster (1 Presto coordinator on the EMR master node and 2 Presto workers on EMR core nodes) by selecting the EMR Presto-sandbox, Hive, HCatalog,a nd Hadoop components using the AWS EMR console. The problem I think I'm having is deploying new connector configurations. When I create a new PostgreSQL Connector configuration properties file in the /etc/presto/conf/catalog directory on all nodes in the cluster and then run "sudo restart presto-server" on all nodes, I get no connector class factory errors when trying to run queries in the presto-cli on the coordinator node.
By inspecting the logs on all the nodes I think I've narrowed this issue down to a problem with the either the Presto properties files as created by EMR or the launcher itself. From the launcher.log I can see in the coordinator node launcher log that I loads the classes in the PostgreSQL connector jars and the PostgreSQL JDBC driver. However, on the worker nodes, I can find the launcher loading the Presto connector jar classes, but can 't find any trace of the launcher loading the JDBC driver classes. The postgresql.properties file deployed on all nodes is exactly the same. The Presto config.properties is exactly what EMR deployed at cluster creation time. EMR has deployed all the stock Presto connector plugins, including the PostgreSQL connector, to all nodes in the cluster and I've verified them as identical directory contents and permisions.
It's worth noting that the EMR deployment of Presto deploys into a more Hadoop like directory structure that can coexist and is more consistent across many Hadoop ecosystem components all installed on the same set of nodes in the EMR cluster.
My questions are:
1. How does the launcher determine which connectors to load?
2. Is it explicitly configurable?
3. Implicitly specified by the plugins installed in the plugins directory?
4. How does the presto-server determine which connectors to initialize which catalogs to load?
5. Is it explicitly configurable? I know the config.properties "datasources=<data_source_list>" has been deprecated, but do any other properties affect catalog loading?
6. Is this also implicitly specified by the catalog properties files in the catalog directory?
7. Are there any command line parameters that can be added when starting the server to get debug output from the launcher, the server daemon itself, or both that will show more detailed log information to help diagnose this issue further?
After, trying to work through this with just the EMR and Presto doc., it seems like the Presto doc. could use quite a bit more detail on how the coordinator and worker process startup and initialization work and how the property files affect that behavior.
Thanks for any any help anyone can offer!