Ugrade from CDAP 3.4.3 to 3.5.0 failed.

92 views
Skip to first unread message
Assigned to caixl...@gmail.com by me

ron cai

unread,
Aug 31, 2016, 2:20:34 AM8/31/16
to CDAP User
Hi,

I am following the guideline of followed link to upgrade my CDAP 3.4.3 to 3.5

But it is failed to execute upgrade command.

root@ip-172-31-30-89:/data/cdap/master/bin# sudo -u cdap /opt/cdap/master/bin/svc-master run co.cask.cdap.data.tools.UpgradeTool upgrade force
Running class co.cask.cdap.data.tools.UpgradeTool with arguments: upgrade force
Invalid maximum heap size: -Xmx1024mm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Anything wrong in this upgrade tool or in my env? Thanks.

Regards,
Ron

ron cai

unread,
Aug 31, 2016, 2:36:00 AM8/31/16
to CDAP User

I use Ambari to manage my HDP and CDAP.

I find in /etc/cdap/conf/cdap-env.sh, there are definitions about JAVA Heap Max

export KAFKA_JAVA_HEAPMAX="-Xmx1024mm"
export MASTER_JAVA_HEAPMAX="-Xmx1024mm"
export ROUTER_JAVA_HEAPMAX="-Xmx1024mm"

And from Ambari, in CDAP config page, the item "Contents of cdap-env.sh" show items:

export KAFKA_JAVA_HEAPMAX="-Xmx{{cdap_kafka_heapsize}}m"
export MASTER_JAVA_HEAPMAX="-Xmx{{cdap_master_heapsize}}m"
export ROUTER_JAVA_HEAPMAX="-Xmx{{cdap_router_heapsize}}m"

should  the "-Xmx1024mm" be "-Xmx1024m"?

And is it a bug?

Regards,
Ron

ron cai

unread,
Aug 31, 2016, 2:50:23 AM8/31/16
to CDAP User
After change the -Xmx1024mm to -Xmx1024m in /etc/cdap/conf/cdap-env.sh. The upgrade tool could run, but it still failed.

The log attached:

ubuntu@ip-172-31-30-89:~$ sudo -u cdap /opt/cdap/master/bin/svc-master run co.cask.cdap.data.tools.UpgradeTool upgrade
Running class co.cask.cdap.data.tools.UpgradeTool with arguments: upgrade
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/cdap/master/lib/ch.qos.logback.logback-classic-1.0.9.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2016-08-31 06:45:22,867 - WARN  [main:c.c.c.c.c.Configuration@592] - kafka.bind.address is deprecated. Instead, use kafka.server.host.name
2016-08-31 06:45:22,873 - WARN  [main:c.c.c.c.c.Configuration@592] - kafka.bind.port is deprecated. Instead, use kafka.server.port
2016-08-31 06:45:22,873 - WARN  [main:c.c.c.c.c.Configuration@592] - kafka.default.replication.factor is deprecated. Instead, use kafka.server.default.replication.factor
2016-08-31 06:45:22,874 - WARN  [main:c.c.c.c.c.Configuration@592] - kafka.log.dir is deprecated. Instead, use kafka.server.log.dirs
2016-08-31 06:45:22,874 - WARN  [main:c.c.c.c.c.Configuration@592] - kafka.num.partitions is deprecated. Instead, use kafka.server.num.partitions
2016-08-31 06:45:22,875 - WARN  [main:c.c.c.c.c.Configuration@1770] - cdap-site.xml:an attempt to override final parameter: stream.instance.file.prefix;  Ignoring.
2016-08-31 06:45:24,120 - WARN  [main:c.c.c.s.a.AbstractAuthorizationService@213] - Authorization policy caching is enabled (security.authorization.cache.enabled is set to true), however, this setting will have no effect because authorization is disabled (security.authorization.enabled is set to false).
2016-08-31 06:45:24,123 - WARN  [main:c.c.c.s.a.AbstractAuthorizationService@213] - Authorization policy caching is enabled (security.authorization.cache.enabled is set to true), however, this setting will have no effect because authorization is disabled (security.authorization.enabled is set to false).
2016-08-31 06:45:24,365 - WARN  [main:o.a.h.u.NativeCodeLoader@62] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-08-31 06:45:25,383 - WARN  [main:o.a.h.h.s.DomainSocketFactory@117] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2016-08-31 06:45:25,392 - INFO  [main:c.c.c.c.g.LocationRuntimeModule$HDFSLocationModule@100] - HDFS namespace is /cdap
2016-08-31 06:45:25,666 - WARN  [main:c.c.c.s.a.AbstractAuthorizationService@213] - Authorization policy caching is enabled (security.authorization.cache.enabled is set to true), however, this setting will have no effect because authorization is disabled (security.authorization.enabled is set to false).
2016-08-31 06:45:25,667 - WARN  [main:c.c.c.s.a.AbstractAuthorizationService@213] - Authorization policy caching is enabled (security.authorization.cache.enabled is set to true), however, this setting will have no effect because authorization is disabled (security.authorization.enabled is set to false).
2016-08-31 06:45:25,670 - INFO  [main:c.c.c.c.g.LocationRuntimeModule$HDFSLocationModule@100] - HDFS namespace is /cdap
UpgradeTool - version 3.5.0-1471907228219.

upgrade - Upgrades CDAP to 3.5.0-1471907228219
  The upgrade tool upgrades the following:
  1. User and System Datasets (upgrades the coprocessor jars)
  2. Stream State Store
  3. System metadata for all existing entities
  4. Metadata indexes for all existing metadata
  5. Any metadata that may have left behind for deleted datasets (This metadata will be removed).
  Note: Once you run the upgrade tool you cannot rollback to the previous version.
Do you want to continue (y/n)
Starting upgrade ...
2016-08-31 06:45:35,872 - ERROR [ThriftRPCServer:o.a.t.d.TransactionService$1$1@88] - Transaction manager aborted, stopping transaction service
Exception in thread "ThriftRPCServer" com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: Version 3 of snapshot encoding is not supported
        at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015)
        at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
        at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
        at org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:177)
        at org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Version 3 of snapshot encoding is not supported
        at org.apache.tephra.snapshot.SnapshotCodecProvider.getCodecForVersion(SnapshotCodecProvider.java:98)
        at org.apache.tephra.snapshot.SnapshotCodecProvider.getCodec(SnapshotCodecProvider.java:125)
        at org.apache.tephra.snapshot.SnapshotCodecProvider.decode(SnapshotCodecProvider.java:135)
        at org.apache.tephra.persist.HDFSTransactionStateStorage.readSnapshotInputStream(HDFSTransactionStateStorage.java:177)
        at org.apache.tephra.persist.HDFSTransactionStateStorage.getLatestSnapshot(HDFSTransactionStateStorage.java:144)
        at org.apache.tephra.TransactionManager.recoverState(TransactionManager.java:470)
        at org.apache.tephra.TransactionManager.doStart(TransactionManager.java:225)
        at com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170)
        ... 5 more
2016-08-31 06:47:05,945 - ERROR [main:c.c.c.d.t.UpgradeTool@266] - Exception while trying to stop upgrade process
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.Exception: Service failed while running
        at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.stopAndWait(AbstractService.java:225) ~[com.google.guava.guava-13.0.1.jar:na]
        at co.cask.cdap.data.tools.UpgradeTool.stop(UpgradeTool.java:263) [co.cask.cdap.cdap-master-3.5.0.jar:na]
        at co.cask.cdap.data.tools.UpgradeTool.doMain(UpgradeTool.java:306) [co.cask.cdap.cdap-master-3.5.0.jar:na]
        at co.cask.cdap.data.tools.UpgradeTool.main(UpgradeTool.java:416) [co.cask.cdap.cdap-master-3.5.0.jar:na]
java.lang.Exception: Service failed while running
        at com.google.common.util.concurrent.AbstractService$1.failed(AbstractService.java:123) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService$6$1.run(AbstractService.java:444) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:262) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService$ListenerExecutorPair.execute(AbstractService.java:470) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService$6.run(AbstractService.java:442) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.executeListeners(AbstractService.java:369) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.notifyFailed(AbstractService.java:313) ~[com.google.guava.guava-13.0.1.jar:na]
        at org.apache.tephra.distributed.TransactionService.abort(TransactionService.java:140) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.distributed.TransactionService$1$1.failed(TransactionService.java:89) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at com.google.common.util.concurrent.AbstractService$6$1.run(AbstractService.java:444) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:262) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService$ListenerExecutorPair.execute(AbstractService.java:470) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService$6.run(AbstractService.java:442) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.executeListeners(AbstractService.java:369) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.start(AbstractService.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
        at org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:177) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
Caused by: java.lang.IllegalArgumentException: Version 3 of snapshot encoding is not supported
        at org.apache.tephra.snapshot.SnapshotCodecProvider.getCodecForVersion(SnapshotCodecProvider.java:98) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.snapshot.SnapshotCodecProvider.getCodec(SnapshotCodecProvider.java:125) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.snapshot.SnapshotCodecProvider.decode(SnapshotCodecProvider.java:135) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.persist.HDFSTransactionStateStorage.readSnapshotInputStream(HDFSTransactionStateStorage.java:177) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.persist.HDFSTransactionStateStorage.getLatestSnapshot(HDFSTransactionStateStorage.java:144) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.TransactionManager.recoverState(TransactionManager.java:470) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at org.apache.tephra.TransactionManager.doStart(TransactionManager.java:225) ~[org.apache.tephra.tephra-core-0.8.0-incubating.jar:0.8.0-incubating]
        at com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170) ~[com.google.guava.guava-13.0.1.jar:na]
        ... 5 common frames omitted
ubuntu@ip-172-31-30-89:~$ ubuntu@ip-172-31-30-89:~$ ubuntu@ip-172-31-30-89:~$ ubuntu@ip-172-31-30-89:~$

Sagar Kapare

unread,
Sep 3, 2016, 4:14:53 AM9/3/16
to ron cai, CDAP User
Hi Ron,

Are you still seeing the issue with the upgrade?
We will try to reproduce this locally and get back to you.

Thanks and Regards,
Sagar

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/1a4e3947-6faf-459d-bc2e-5e3cb392c6b2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

ron cai

unread,
Sep 6, 2016, 9:02:10 AM9/6/16
to CDAP User
Hi Sagar,

I don't known why my replies always be removed.

I have installed CDAP 3.5 after fixed some issues.

1. clean all cdap related folders
2. Install cdap components on Ambari
3. When installing, there were error related with the script files.

Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/CDAP/3.5.1/package/scripts/kafka.py",
line 72, in <module>
Kafka().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/CDAP/3.5.1/package/scripts/kafka.py",
line 34, in install
self.configure(env)
File "/var/lib/ambari-agent/cache/common-services/CDAP/3.5.1/package/scripts/kafka.py",
line 61, in configure
helpers.cdap_config('kafka')
File "/var/lib/ambari-agent/cache/common-services/CDAP/3.5.1/package/scripts/ambari_helpers.py",
line 59, in cdap_config
create_parents=True
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 146, in __init__
raise Fail("%s received unsupported argument %s" % (self, key))
resource_management.core.exceptions.Fail:
Directory['/etc/cdap/conf.ambari'] received unsupported argument
create_parents

4. After commented the create_parents in the script, I can install the
CDAP components.
5. Start CDAP service successfully
6. The CDAP service is unstable, there are errors in the router log.

The log attached.

Could you help to find what is wrong in my environment?

Thanks,
Ron

On Mon, Sep 5, 2016 at 10:59 AM, ron cai <caixl...@gmail.com> wrote:
> Hi Sagar,
>
> I can't upgrade to 3.5.
>
> After tried the steps on the upgrade manual, the upgrade failed.
>
> And I tried to remove the CDAP 3.4 package and install the 3.5 packages,
> the above issue was not observed. But there was another error during the
> upgrade the tables and made the CDAP master failed. Sorry that I didn't save
> the log.
>
> I have to reinstall it back the 3.4 at the end. And I think maybe there is
> an issue in the ambari-cdap-service. The configuration of CDAP 3.4 has the
> additional 'm' in the heap max parameters after I install 3.4 back.
>
> export KAFKA_JAVA_HEAPMAX="-Xmx{{cdap_kafka_heapsize}}m"
> export MASTER_JAVA_HEAPMAX="-Xmx{{cdap_master_heapsize}}m"
> export ROUTER_JAVA_HEAPMAX="-Xmx{{cdap_router_heapsize}}m"
>
>
> Regards,
> Ron
>
> 在 2016年9月3日星期六 UTC+8下午4:14:53,Sagar Kapare写道:
>>> email to cdap-user+...@googlegroups.com.
conf_and_log.zip
Message has been deleted

chris

unread,
Sep 7, 2016, 10:32:16 AM9/7/16
to CDAP User
OK, strange. My messages were deleted, too. I'll have the group admins look at it.

Anyway, there's another error in the service. I've filed https://issues.cask.co/browse/CDAP-7233 for that.

I'll have an updated Ambari service out with these resolved. For now, you can modify the "data.tx.snapshot.codecs" CDAP configuration property and change the "co.cask.tephra.snapshot.SnapshotCodecV3" and "co.cask.tephra.snapshot.SnapshotCodecV4" to "org.apache.tephra.snapshot.SnapshotCodecV3" and "org.apache.tephra.snapshot.SnapshotCodecV4", respectively.

Sreevatsan Raman

unread,
Sep 7, 2016, 10:47:48 AM9/7/16
to chris, CDAP User
Chris/Ron:
Some of your posts were marked as spam by google. I have marked those as normal post. Let us know if you see this again. 

Thanks,
Sree

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+unsubscribe@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
Screen Shot 2016-09-07 at 7.46.06 AM.png

chris

unread,
Sep 7, 2016, 4:56:10 PM9/7/16
to CDAP User, ch...@cask.co
Weird.

Anyway, I have released a new version of the CDAP Ambari service with the appropriate fixes. :-)


On Wednesday, September 7, 2016 at 10:47:48 AM UTC-4, Sreevatsan Raman wrote:
Chris/Ron:
Some of your posts were marked as spam by google. I have marked those as normal post. Let us know if you see this again. 

Thanks,
Sree
On Wed, Sep 7, 2016 at 7:32 AM, chris <ch...@cask.co> wrote:
OK, strange. My messages were deleted, too. I'll have the group admins look at it.

Anyway, there's another error in the service. I've filed https://issues.cask.co/browse/CDAP-7233 for that.

I'll have an updated Ambari service out with these resolved. For now, you can modify the "data.tx.snapshot.codecs" CDAP configuration property and change the "co.cask.tephra.snapshot.SnapshotCodecV3" and "co.cask.tephra.snapshot.SnapshotCodecV4" to "org.apache.tephra.snapshot.SnapshotCodecV3" and "org.apache.tephra.snapshot.SnapshotCodecV4", respectively.

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.

To post to this group, send email to cdap...@googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages