CDK, Flume 1.4 and MorphlineInterceptor

1,771 views
Skip to first unread message

bitsof...@gmail.com

unread,
Oct 2, 2013, 5:38:30 PM10/2/13
to cdk...@cloudera.org
Hi, 
New to flume and I am trying to use the MorphlineInterceptor per the documentation here:


When I run flume, it immediately gives ClassNotFoundException for com.cloudera.cdk.morphline.api.Command

So I pulled down the morphlines CDK and built it, and tried copying the morphine core jar over to the flume/lib dir, run, and just get the next ClassNotFoundException in what I imagine will be a long line of references from the CDS dependencies.

Is there a flume jar or CDK jar that just includes "everything" dependencies wise to use flume's morphline interceptor?

thanks!

Wolfgang Hoschek

unread,
Oct 2, 2013, 6:07:10 PM10/2/13
to bitsof...@gmail.com, cdk...@cloudera.org
You can use module cdk-morphlines-all for that.

Another alternative: It all also comes prepackaged with cloudera Search.

Wolfgang.
--
You received this message because you are subscribed to the Google Groups "CDK Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.
For more options, visit https://groups.google.com/a/cloudera.org/groups/opt_out.

bitsof...@gmail.com

unread,
Oct 3, 2013, 10:14:26 AM10/3/13
to cdk...@cloudera.org, bitsof...@gmail.com
Thanks - so I pulled down the entire CDK, did a mvn install, but under  cdk-release-0.7.0/cdk-morphlines/cdk-morphlines-all/target/ I don't see a large jar, just a test one (cdk-morphlines-all-0.7.0-tests.jar) which is quite small.  

So I tried appending to the flume classpath to here -C /Users/me/Documents/ddd/flume/morphline/cdk-release-0.7.0/cdk-morphlines/cdk-morphlines-all/target/lib/*

And I get this starting flume

2013-10-03 08:10:23,184 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:89)
at org.apache.flume.sink.solr.morphline.MorphlineInterceptor$LocalMorphlineInterceptor.<init>(MorphlineInterceptor.java:140)
at org.apache.flume.sink.solr.morphline.MorphlineInterceptor.<init>(MorphlineInterceptor.java:55)

Wolfgang Hoschek

unread,
Oct 3, 2013, 12:53:30 PM10/3/13
to bitsof...@gmail.com, cdk...@cloudera.org
Since you are want to use it inside Flume, here is a flume-centric way of getting hold of all the jars:

* clone flume from git
* cd flume
* edit flume-ng-sinks/flume-ng-morphline-solr-sink/pom.xml, in there do the following:

** make cdk-morphlines-all required by commenting out this blurb: <optional>true</optional>
** add the following mvn blurb to the <build> element in order to copy the dependency jars into the target/lib dir:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<includeScope>runtime</includeScope> <!-- excludes test jars; see http://jira.codehaus.org/browse/MDEP-128 -->
<excludeScope>provided</excludeScope>
</configuration>
</execution>
</executions>
</plugin>

* mvn -Dhadoop.profile=2 clean package -pl flume-ng-sinks/flume-ng-morphline-solr-sink

* find flume-ng-sinks/flume-ng-morphline-solr-sink/target -name '*.jar'

* copy the jars printed out by the above find command into the flume lib dir

Wolfgang.

mgsr...@gmail.com

unread,
Dec 15, 2013, 12:04:32 PM12/15/13
to cdk...@cloudera.org, bitsof...@gmail.com
Hi Wolfgang,

I have install(new install) CDH 4.5 with all components, however I am getting the com.cloudera.cdk.morphline.api.Command class not found exception.

I could see cdk-morphlines*.jar file under /opt/cloudera/parcels/SOLR-1.1.0-1.cdh4.3.0.p0.21/lib/search/lib:

Please let me know how can I enable CLASSPATH or remedy for this error.



2013-12-15 08:54:27,180 INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
2013-12-15 08:54:27,195 DEBUG org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Configuration provider started
2013-12-15 08:54:27,199 DEBUG org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Checking file:/var/run/cloudera-scm-agent/process/537-flume-AGENT/flume.conf for changes
2013-12-15 08:54:27,199 INFO org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/var/run/cloudera-scm-agent/process/537-flume-AGENT/flume.conf
2013-12-15 08:54:27,205 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink
2013-12-15 08:54:27,205 DEBUG org.apache.flume.conf.FlumeConfiguration: Created context for solrSink: channel
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Added sinks: HDFS-LAB solrSink Agent: syslog_agent
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,206 DEBUG org.apache.flume.conf.FlumeConfiguration: Created context for HDFS-LAB: hdfs.writeFormat
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,206 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,207 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,207 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,207 INFO org.apache.flume.conf.FlumeConfiguration: Processing:solrSink
2013-12-15 08:54:27,207 INFO org.apache.flume.conf.FlumeConfiguration: Processing:HDFS-LAB
2013-12-15 08:54:27,207 DEBUG org.apache.flume.conf.FlumeConfiguration: Starting validation of configuration for agent: syslog_agent, initial-configuration: AgentConfiguration[syslog_agent]
SOURCES: {Syslog={ parameters:{port=5140, host=0.0.0.0, interceptors=morphlineinterceptor, interceptors.uuidinterceptor.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder, interceptors.i2.type=host, channels=MemoryChannel-1, interceptors.uuidinterceptor.headerName=id, interceptors.morphlineinterceptor.type=org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder, interceptors.ts.type=timestamp, type=syslogtcp, interceptors.i2.hostHeader=hostname} }}
CHANNELS: {MemoryChannel-1={ parameters:{transactionCapacity=1000, capacity=1000, type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.file.rollInterval=60, hdfs.path=/syslogs/%{log_type}/%{host}/%b-%d-%Y, hdfs.file.Prefix=syslogfiles, hdfs.file.Type=SequenceFile, hdfs.writeFormat=Text, type=hdfs, channel=MemoryChannel-1} }, solrSink={ parameters:{morphlineId=morphline1, type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink, channel=MemoryChannel-1, morphlineFile=/etc/flume-ng/conf/morphlines.conf} }}

2013-12-15 08:54:27,215 DEBUG org.apache.flume.conf.FlumeConfiguration: Created channel MemoryChannel-1
2013-12-15 08:54:27,227 DEBUG org.apache.flume.conf.FlumeConfiguration: Creating sink: HDFS-LAB using HDFS
2013-12-15 08:54:27,231 DEBUG org.apache.flume.conf.FlumeConfiguration: Creating sink: solrSink using OTHER
2013-12-15 08:54:27,233 DEBUG org.apache.flume.conf.FlumeConfiguration: Post validation configuration for syslog_agent
AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[syslog_agent]
SOURCES: {Syslog={ parameters:{port=5140, host=0.0.0.0, interceptors=morphlineinterceptor, interceptors.uuidinterceptor.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder, interceptors.i2.type=host, channels=MemoryChannel-1, interceptors.uuidinterceptor.headerName=id, interceptors.morphlineinterceptor.type=org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder, interceptors.ts.type=timestamp, type=syslogtcp, interceptors.i2.hostHeader=hostname} }}
CHANNELS: {MemoryChannel-1={ parameters:{transactionCapacity=1000, capacity=1000, type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.file.rollInterval=60, hdfs.path=/syslogs/%{log_type}/%{host}/%b-%d-%Y, hdfs.file.Prefix=syslogfiles, hdfs.file.Type=SequenceFile, hdfs.writeFormat=Text, type=hdfs, channel=MemoryChannel-1} }, solrSink={ parameters:{morphlineId=morphline1, type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink, channel=MemoryChannel-1, morphlineFile=/etc/flume-ng/conf/morphlines.conf} }}

2013-12-15 08:54:27,233 DEBUG org.apache.flume.conf.FlumeConfiguration: Channels:MemoryChannel-1

2013-12-15 08:54:27,233 DEBUG org.apache.flume.conf.FlumeConfiguration: Sinks HDFS-LAB solrSink

2013-12-15 08:54:27,233 DEBUG org.apache.flume.conf.FlumeConfiguration: Sources Syslog

2013-12-15 08:54:27,234 INFO org.apache.flume.conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [syslog_agent]
2013-12-15 08:54:27,234 INFO org.apache.flume.node.AbstractConfigurationProvider: Creating channels
2013-12-15 08:54:27,244 INFO org.apache.flume.channel.DefaultChannelFactory: Creating instance of channel MemoryChannel-1 type memory
2013-12-15 08:54:27,248 INFO org.apache.flume.node.AbstractConfigurationProvider: Created channel MemoryChannel-1
2013-12-15 08:54:27,249 INFO org.apache.flume.source.DefaultSourceFactory: Creating instance of source Syslog, type syslogtcp
2013-12-15 08:54:27,280 ERROR org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: com/cloudera/cdk/morphline/api/Command

    at org.apache.flume.sink.solr.morphline.MorphlineInterceptor.<init>(MorphlineInterceptor.java:55)
    at org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder.build(MorphlineInterceptor.java:117)
    at org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder.build(MorphlineInterceptor.java:108)
    at org.apache.flume.channel.ChannelProcessor.configureInterceptors(ChannelProcessor.java:111)
    at org.apache.flume.channel.ChannelProcessor.configure(ChannelProcessor.java:80)
    at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
    at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:353)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: com.cloudera.cdk.morphline.api.Command
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 18 more
2013-12-15 08:54:27,313 DEBUG com.cloudera.cmf.event.publish.AvroEventStorePublishProxy: (Re)connecting to hathi-nn.engba.symantec.com:7184

Tom Wheeler

unread,
Dec 15, 2013, 12:55:19 PM12/15/13
to mgsr...@gmail.com, cdk...@cloudera.org, bitsof...@gmail.com
I am not Wolfgang, but his previous instruction said to "copy the jars
printed out by the above find command into the flume lib dir" and it's
not clear to me that you did that.

You mention seeing the morphlines JAR file in
/opt/cloudera/parcels/SOLR-1.1.0-1.cdh4.3.0.p0.21/lib/search/lib, but
I don't think that's likely to be part of Flume's class path.

On my CDH 4.5.0 system, the Flume lib directory to which Wolfgang
refers is /usr/lib/flume-ng/lib/. I set up Flume with a Morphline
interceptor under CDH 4.5.0 last week by copying my JAR files there.

Hope that helps,

Tom Wheeler



On Sun, Dec 15, 2013 at 11:04 AM, <mgsr...@gmail.com> wrote:
> Hi Wolfgang,
>
> I have install(new install) CDH 4.5 with all components, however I am
> getting the com.cloudera.cdk.morphline.api.Command class not found
> exception.
>
> I could see cdk-morphlines*.jar file under
> /opt/cloudera/parcels/SOLR-1.1.0-1.cdh4.3.0.p0.21/lib/search/lib:
>
> Please let me know how can I enable CLASSPATH or remedy for this error.
>
> ...
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider: Failed to
> start agent because dependencies were not found in classpath. Error follows.
> java.lang.NoClassDefFoundError: com/cloudera/cdk/morphline/api/Command
>
> ...

Wolfgang Hoschek

unread,
Dec 15, 2013, 1:39:12 PM12/15/13
to Tom Wheeler, mgsr...@gmail.com, cdk...@cloudera.org, bitsof...@gmail.com
I believe this is a known issue with the Parcel - CDH-16144. The Parcel team is working on it. Meanwhile try the following work-around:

"If I set SEARCH_HOME=/opt/cloudera/parcels/SOLR/lib/search in the $FLUME_HOME/bin/flume-ng script in a parcel install the issue is resolved. An easier alternative is to add the setting SEARCH_HOME=/opt/cloudera/parcels/SOLR/lib/search to the CM configuration > Flume > Service Wide > Flume Service Environment Safety Valve that works as well."

Another work-around would be to use a package based install, because the issue only happens with Parcels.

Wolfgang.

Noble Paul

unread,
May 27, 2015, 10:12:18 AM5/27/15
to cdk...@cloudera.org, bitsof...@gmail.com
I get thesese errors . Unable to resolce certain dependencies

[ERROR] Failed to execute goal on project flume-ng-morphline-solr-sink: Could not resolve dependencies for project org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.7.0-SNAPSHOT: The following artifacts could not be resolved: org.kitesdk:kite-morphlines-solr-cell:jar:1.0.0, org.apache.tika:tika-xmp:jar:1.5, org.apache.tika:tika-parsers:jar:1.5, org.apache.pdfbox:fontbox:jar:1.8.4, org.apache.pdfbox:jempbox:jar:1.8.4, ua_parser:ua-parser:jar:1.3.0, org.slf4j:jcl-over-slf4j:jar:1.6.1, org.eclipse.jetty:jetty-servlet:jar:8.1.8.v20121106, org.eclipse.jetty:jetty-util:jar:8.1.8.v20121106, org.eclipse.jetty:jetty-server:jar:8.1.8.v20121106, org.eclipse.jetty:jetty-webapp:jar:8.1.8.v20121106: Could not find artifact org.kitesdk:kite-morphlines-solr-cell:jar:1.0.0 in cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]

Wolfgang Hoschek

unread,
May 27, 2015, 10:26:07 AM5/27/15
to Noble Paul, cdk...@cloudera.org, bitsof...@gmail.com
Hi Paul,

The mvn artifacts of upstream kite are in maven central (e.g. http://search.maven.org/#search%7Cga%7C1%7Ckite-morphlines-core ) rather than in the cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos)

Wolfgang.

Reply all
Reply to author
Forward
0 new messages