Diagnosing launcher on ERM Presto?

229 views
Skip to first unread message

Kurt Larson

unread,
Jun 12, 2016, 10:46:26 AM6/12/16
to Presto
Hi all,

I'm running into an issue that I think is launcher related.  I'm using EMR 4.7.0 Presto just released on Jun 2nd.  I created a 3 node cluster (1 Presto coordinator on the EMR master node and 2 Presto workers on EMR core nodes) by selecting the EMR Presto-sandbox, Hive, HCatalog,a nd Hadoop components using the AWS EMR console.  The problem I think I'm having is deploying new connector configurations.  When I create a new PostgreSQL Connector configuration properties file in the /etc/presto/conf/catalog directory on all nodes in the cluster and then run "sudo restart presto-server" on all nodes, I get no connector class factory errors when trying to run queries in the presto-cli on the coordinator node. 

By inspecting the logs on all the nodes I think I've narrowed this issue down to a problem with the either the Presto properties files as created by EMR or the launcher itself.  From the launcher.log I can see in the coordinator node launcher log that I loads the classes in the PostgreSQL connector jars and the PostgreSQL JDBC driver.  However, on the worker nodes, I can find the launcher loading the Presto connector jar classes, but can 't find any trace of the launcher loading the JDBC driver classes.   The postgresql.properties file deployed on all nodes is exactly the same.  The Presto config.properties is exactly what EMR deployed at cluster creation time.  EMR has deployed all the stock Presto connector plugins, including the PostgreSQL connector, to all nodes in the cluster and I've verified them as identical directory contents and permisions.

It's worth noting that the EMR deployment of Presto deploys into a more Hadoop like directory structure that can coexist and is more consistent across many Hadoop ecosystem components all installed on the same set of nodes in the EMR cluster.

My questions are:

1. How does the launcher determine which connectors to load? 
2. Is it explicitly configurable?
3. Implicitly specified by the plugins installed in the plugins directory?
4. How does the presto-server determine which connectors to initialize which catalogs to load?
5. Is it explicitly configurable?  I know the config.properties "datasources=<data_source_list>" has been deprecated, but do any other properties affect catalog loading?
6. Is this also implicitly specified by the catalog properties files in the catalog directory?
7. Are there any command line parameters that can be added when starting the server to get debug output from the launcher, the server daemon itself, or both that will show more detailed log information to help diagnose this issue further?

After, trying to work through this with just the EMR and Presto doc., it seems like the Presto doc. could use quite a bit more detail on how the coordinator and worker process startup and initialization work and how the property files affect that behavior.

Thanks for any any help anyone can offer!

Wallin, Christina A

unread,
Jun 13, 2016, 12:21:46 PM6/13/16
to presto...@googlegroups.com
Hi Kurt,

I'm not familiar with the exact directories that EMR uses, but here are some answers to your questions.

First off, to clarify, is the PostgreSQL connector not working? What errors are you getting?

The launcher doesn't decide which connectors to load -- the Presto server handles that. Presto tries to load each of the catalog properties files that it can find in the catalog directory, and looks in the plugin directory for the connector directory corresponding to the connector.name field.

If you see the postgres connector being loaded in the coordinator/worker server.log files, it *should* be accessible and query-able. If not, I'd poke around for where the other config files are (either plugin.config-dir from node.properties, or something set by default in launcher.py). I'd be surprised if the plugin directories were set up wrong, but you can check plugin.dir in node.properties to make sure that there's a postgresql directory with the correct jars in it.

To see an example of the launcher script being used, you can check out the init.d script of presto-server-rpm [1] -- I'm pretty sure that EMR doesn't use this RPM, but it can serve as an example for CLI args to the launcher script. Also, not sure if you found it, but the launcher script itself is in airlift [2].

Hope this helps! I'm pretty sure I answered all of your questions that were relevant, but definitely let me know if you have more questions.

Christina

[1] https://github.com/prestodb/presto/blob/master/presto-server-rpm/src/main/resources/dist/etc/init.d/presto
[2] https://github.com/airlift/airlift/tree/master/launcher

From: presto...@googlegroups.com [presto...@googlegroups.com] on behalf of Kurt Larson [kla...@turbine.com]
Sent: Sunday, June 12, 2016 10:46 AM
To: Presto
Subject: Diagnosing launcher on ERM Presto?

--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kurt Larson

unread,
Jun 13, 2016, 6:45:46 PM6/13/16
to Presto, Christin...@teradata.com
Hi Christina,

First off many thanks for your help and detailed response.

Let my try to answer your questions as completely as I can.

1. Q. First off, to clarify, is the PostgreSQL connector not working?
    A. Yes, as I see it the PostgreSQL connector is not working or at least is either not initializing or connecting to source database correctly.  At this point I'm not exactly sure what the root cause is.  I'm in the process of eliminating possible causes.
2. Q. What errors are you getting?
    A. The specific error that I'm getting as seen in the Presto server log is:

2016-06-13T20:41:32.387Z        INFO    main    com.facebook.presto.server.PluginManager        -- Finished loading plugin /usr/lib/presto/plugin/tpch --
2016-06-13T20:41:32.403Z        INFO    main    com.facebook.presto.metadata.CatalogManager     -- Loading catalog /etc/presto/conf/catalog/redshiftnonprod.properties --
2016-06-13T20:41:32.403Z        ERROR   main    com.facebook.presto.server.PrestoServer No factory for connector postgresql
java.lang.IllegalArgumentException: No factory for connector postgresql
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
        at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:147)
        at com.facebook.presto.metadata.CatalogManager.loadCatalog(CatalogManager.java:99)
        at com.facebook.presto.metadata.CatalogManager.loadCatalogs(CatalogManager.java:77)
        at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:138)
        at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:72)
 
What I see is all prior to this in the log is all the plugins loading, installing and registering. About 15 lines earlier in the server log I see the Presto PostgreSQL connector doing that in these log lines:

2016-06-13T20:41:32.281Z        INFO    main    com.facebook.presto.server.PluginManager        -- Loading plugin /usr/lib/presto/plugin/postgresql --
2016-06-13T20:41:32.292Z        INFO    main    com.facebook.presto.server.PluginManager        Installing com.facebook.presto.plugin.postgresql.PostgreSqlPlugin
2016-06-13T20:41:32.306Z        INFO    main    com.facebook.presto.server.PluginManager        Registering legacy connector postgresql
2016-06-13T20:41:32.306Z        INFO    main    com.facebook.presto.server.PluginManager        -- Finished loading plugin /usr/lib/presto/plugin/postgresql --

So from the server log we can see the error is happening when the Presto server is trying load the PostgreSQL catalog where "load" here has a different meaning than "loading" Java classes from a JAR file.  As you may have noticed what I'm really trying to do is use the Presto PostgreSQL connector to access an AWS Redshift database instance where Redshift, as a derivative of PostgreSQL, works just fine from many PostgreSQL clients with PostgreSQL JDBC drivers.  Not to muddy the waters too much, but I get the same result if I change the PostgreSQL connector properties file to connect to a real PostgreSQL instance that is AWS RDS PostgreSQL. So, I think, I have effectively eliminated Redshift as a variable.

After a bit more "reverse engineering" of the EMR 4.7.0 Presto deployment and going over some of the Presto java code I now understand the launcher's role more clearly.  I see what you mean about "The launcher doesn't decide which connectors to load -- the Presto server handles that."  I was just thrown off a bit by the Presto server redirecting its early lifecycle Java class loading output from STDOUT and STDERR to the launcher.log file before it starts writing to the server.log file, as described in the /usr/lib/presto/bin/launcher shell script.  

So to come back to one the observations in my original post, in the launcher.log file I'm still not seeing the PostgreSQL JDBC driver Java classes being loaded into the JVM on the worker nodes.  I'm still seeing the PostgreSQL JDBC driver Java classes being loaded into the JVM on the coordinator node.  At the moment, I think that's as close as I am to the root cause of the problem.  

However, as I said earlier, PostgreSQL connector properties file, the plugin directory for the PostgreSQL connector, /usr/lib/presto/plugin/postgresql for EMR, contents including the PostgreSQL JDBC driver jar, and all the other PostgreSQL connector file ownership and permissions are all exactly the same.  Since my my original post I've further verified that the JVM classpath and all the other config files are all the same, except for the entries that are expected to be different between the cooordinator and workers, like coordinator=<true/false> in config.properties and node.id=<id_value> in node.properties.

So that's where I am at.

WRT looking at how "...init.d script of presto-server-rpm [1]" uses launcher, that's not really much help, but I've done effectively the same on this EMR deployment.   EMR has converted the Presto server process management from Sys V with the /etc/init.d directory scripts to Upstart with the /etc/init directory .conf files.  That said, I've walked through all that and there isn't any difference between the coordinator and worker nodes that could lead to a difference, like a different classpath, that results in the behavior I'm seeing. 

Any further ideas would be appreciated.


Wallin, Christina A

unread,
Jun 14, 2016, 2:53:28 PM6/14/16
to Kurt Larson, Presto
Hi Kurt,

Yes, loading a connector is different than loading Java class from a jar file. The issue you're getting is because the Presto server can't read the connector factory that creates the PostgreSQL connector. So you need to make sure both the presto-base-jdbc-<presto version>.jar file and the presto-postgresql-<presto version>.jar files are in the plugin directory -- there will be a good number of other assorted libraries there too.

Can you post both the connector properties file and the node.properties file here, to make sure that everything is set? Do you have the connector properties file on all of the nodes?

If you really can't figure out what's going on, I recommend debugging your Presto server and putting a breakpoint in CatalogManager or some place like that. You have some sort of issue where the plugin directory doesn't have the right jars in it, or the node.properties file doesn't point to the right plugin directory. Maybe check with Amazon to make sure that everything's all set on that version of EMR.

Christina

From: Kurt Larson [kla...@turbine.com]
Sent: Monday, June 13, 2016 6:45 PM
To: Presto
Cc: Wallin, Christina A
Subject: Re: Diagnosing launcher on ERM Presto?

Kurt Larson

unread,
Jun 28, 2016, 1:41:17 PM6/28/16
to Presto, kla...@turbine.com, Christin...@teradata.com
Hi all,

To follow-up on this issue for anyone interested. After working with AWS's EMR team, barking up the wrong tree(s) for several days, the issue has been diagnosed to a subtle bug that's being reported by AWS.  The problem was trailing whitespace in the "connector.name" property in a catalog properties file.  The actual text for the line was "connector.name=postgresql " as in "connector.name=postgresql<space>".  Apparently, it was causing a string comparison to fail when matching up the connector.name from the catalog properties file with the connector plug-in causing the presto-server process on the worker nodes with the trailing whitespace to not load the JDBC driver classes for the PostgreSQL connector plug-in specified in the catalog properties file.  Removing the trailing whitespace completely resolved the issue, resulting in the JDBC driver classes getting loaded and the connector functioning as expected for moderate sample of SQL statements and Presto commands. The EMR version was 4.7.0 and the Presto version was 0.147.

It's worth noting here that the Presto PostgreSQL connector appears to function correctly for a moderate sample of Presto SQL queries and commands when configured to connect to an AWS Redshift instance.  Additionally, the Presto MySQL connector also appears to function correctly for a similar moderate sample of Presto SQL queries and commands when configured to connect to an AWS RDS Aurora instance. Neither, query sample is intended to be a full regression test of functionality or statement of compatibility, but at least they indicate a good start toward compatibility.


Stephen Sprague

unread,
Jun 28, 2016, 2:50:09 PM6/28/16
to presto...@googlegroups.com
you gotta luv it.  the 'ol trailing whitespace wingding. you have my sympathies! argh!

Cheers,
Stephen.


Reply all
Reply to author
Forward
0 new messages