Caching Hive Metastore: Too many connections created by partitionCache during background refresh

517 views
Skip to first unread message

Yaliang Wang

unread,
Dec 20, 2016, 7:19:57 PM12/20/16
to Presto, Thomas Sun, Bill Graham, Karthik Ramasamy
We are running a twitter-forks version(0.157-tw-0.28) based on open source 0.157 version.

All instances of Presto are running on the Mesos with the "ephemeral port limit" set to 1024. We only have two instances of hive metastore. The partitionCache background refreshing creating one connection per partition if the query stays long enough to hit the cache TTL causes the result which all queries failed to create any new connection to hive metastore. It ends as some errors like following:
com.facebook.presto.spi.PrestoException: Failed connecting to Hive metastore.
        at com
.facebook.presto.hive.HiveSplitSource.propagatePrestoException(HiveSplitSource.java:145)
        at com
.facebook.presto.hive.HiveSplitSource.isFinished(HiveSplitSource.java:123)
        at com
.facebook.presto.split.ConnectorAwareSplitSource.isFinished(ConnectorAwareSplitSource.java:62)
        at com
.facebook.presto.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:120)
        at com
.facebook.presto.execution.scheduler.SqlQueryScheduler.schedule(SqlQueryScheduler.java:340)
        at java
.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java
.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java
.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed connecting to Hive metastore.
        at com
.facebook.presto.twitter.hive.ZookeeperServersetHiveCluster.createMetastoreClient(ZookeeperServersetHiveCluster.java:66)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore.lambda$loadPartitionsByNames$19(CachingHiveMetastore.java:826)
        at com
.facebook.presto.hive.metastore.HiveMetastoreApiStats.lambda$wrap$0(HiveMetastoreApiStats.java:42)
        at com
.facebook.presto.hive.RetryDriver.run(RetryDriver.java:136)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore.loadPartitionsByNames(CachingHiveMetastore.java:825)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore.access$800(CachingHiveMetastore.java:100)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore$8.loadAll(CachingHiveMetastore.java:239)
        at com
.google.common.cache.CacheLoader$1.loadAll(CacheLoader.java:206)
        at com
.google.common.cache.LocalCache.loadAll(LocalCache.java:4025)
        at com
.google.common.cache.LocalCache.getAll(LocalCache.java:3988)
        at com
.google.common.cache.LocalCache$LocalLoadingCache.getAll(LocalCache.java:4838)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore.getAll(CachingHiveMetastore.java:305)
        at com
.facebook.presto.hive.metastore.CachingHiveMetastore.getPartitionsByNames(CachingHiveMetastore.java:759)
        at com
.facebook.presto.hive.HiveSplitManager.lambda$getPartitionMetadata$1(HiveSplitManager.java:196)
        at com
.google.common.collect.Iterators$8.transform(Iterators.java:799)
        at com
.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at com
.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at com
.google.common.collect.Iterators$5.hasNext(Iterators.java:548)
        at com
.facebook.presto.hive.ConcurrentLazyQueue.poll(ConcurrentLazyQueue.java:37)
        at com
.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:251)
        at com
.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:81)
        at com
.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.doProcess(BackgroundHiveSplitLoader.java:211)
        at java
.security.AccessController.doPrivileged(Native Method)
        at javax
.security.auth.Subject.doAs(Subject.java:422)
        at org
.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at com
.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:190)
        at com
.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
        at com
.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
        at io
.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
       
... 3 more
Caused by: org.apache.thrift.transport.TTransportException: smf1-cgd-11-sr1.prod.twitter.com: java.net.ConnectException: Cannot assign requested address
        at com
.facebook.presto.hive.thrift.Transport.rewriteException(Transport.java:91)
        at com
.facebook.presto.hive.thrift.Transport.create(Transport.java:43)
        at com
.facebook.presto.hive.HiveMetastoreClientFactory.create(HiveMetastoreClientFactory.java:51)
        at com
.facebook.presto.twitter.hive.ZookeeperServersetHiveCluster.createMetastoreClient(ZookeeperServersetHiveCluster.java:58)
       
... 31 more
Caused by: java.net.ConnectException: Cannot assign requested address
        at java
.net.PlainSocketImpl.socketConnect(Native Method)
        at java
.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java
.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java
.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java
.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java
.net.Socket.connect(Socket.java:589)
        at org
.apache.thrift.transport.TSocket.open(TSocket.java:180)
        at com
.facebook.presto.hive.thrift.Transport.create(Transport.java:38)
       
... 33 more

com.facebook.presto.twitter.hive.ZookeeperServersetHiveCluster 
is similar with com.facebook.presto.hive.StaticHiveCluster but with the Zookeeper discovery.

So basically, the story is:
    the query started, partitions were loaded to cache(which grouped by batch number of partitions per connection, that's great)
    the query stayed long enough and hit the cache TTL
    once some of them got loaded, connections were closed and in TIME_WAIT, which take a few seconds/minutes to actually disappear.
    connection number to hive metastore instance hits 1024 "ephemeral port limit" and cannot assign a local port for new connections
    if any query need to connect to hive metastore, it got "Failed connecting to Hive metastore" error.
    
The only fix we currently have without code change is set as following in coordinator instance, which simply recycles the TIME_WAIT connections:
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net
.ipv4.tcp_tw_reuse=1

We also submitted a pull request to add a transport pool, but we wish to handle it with fewer code changes.

Have you ever see this problem? If so, how do you handle this issue in practice? If not, any suggestion is appreciated!


翟玉勇

unread,
Dec 13, 2017, 10:10:50 PM12/13/17
to Presto
we have the same problem and ready to connect the metastore pooling to see your PR

在 2016年12月21日星期三 UTC+8上午8:19:57,Yaliang Wang写道:
Reply all
Reply to author
Forward
0 new messages