I want to remove attachment_body and attachment_mime type field to not go to solr via Morphline

70 views
Skip to first unread message

ixit shah

unread,
Jun 2, 2016, 5:37:13 AM6/2/16
to CDK Development
Please see the below conf file of Morphline.

How can i do the same.

# Specify server locations in a SOLR_LOCATOR variable; used later in
# variable substitutions:
SOLR_LOCATOR : {
  # Name of solr collection
  collection : gettingstarted

  # ZooKeeper ensemble
  zkHost : "localhost:2181"
}


morphlines : [
  {
    # Name used to identify a morphline. E.g. used if there are multiple
    # morphlines in a morphline config file
    id : morphline1

    # Import all morphline commands in these java packages and their
    # subpackages. Other commands that may be present on the classpath are 
    # not visible to this morphline.
    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
    #importCommands : ["org.**", "com.**"]

    commands : [
      { readJson: {} }

{ extractJsonPaths { flatten: false, paths: {
              "componentType" : /componentType,
              "id" : /id,
              "jobName" : /jobName,
              "jobKey" : /jobKey,
              "jobGroup" : /jobGroup,
              "endTime" : /endTime,
              "jobName" : /jobName,
              "jobStatus" : /jobStatus,
              "agentName" : /agentName
      } } }

      { logError { format : "record: {}", args : ["@{}"] } }
        # Parse input attachment and emit a record for each input line                
        { logDebug { format : "output record: {}", args : ["@{}"] } }
      
# This command deletes record fields that are unknown to Solr 
      # schema.xml.
      #
      # Recall that Solr throws an exception on any attempt to load a document 
      # that contains a field that is not specified in schema.xml.
      {
        sanitizeUnknownSolrFields {
          # Location from which to fetch Solr schema
          solrLocator : ${SOLR_LOCATOR}
        }
      }  
    


      # log the record at INFO level to SLF4J
      { logInfo { format : "output record: {}", args : ["@{}"] } }

      # load the record into a Solr server or MapReduce Reducer
     {
       loadSolr {
         solrLocator : ${SOLR_LOCATOR}
      }
    }
    ]
  }
]



=========================================

I am getting exception:

2016-06-02 14:39:48,040 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:186)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: ERROR: [doc=ixits-ux_job1_1464858313005] unknown field '_attachment_body'
at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.kitesdk.morphline.solr.SolrServerDocumentLoader.sendLoads(SolrServerDocumentLoader.java:140)
at org.kitesdk.morphline.solr.SolrServerDocumentLoader.sendBatch(SolrServerDocumentLoader.java:131)
at org.kitesdk.morphline.solr.SolrServerDocumentLoader.commitTransaction(SolrServerDocumentLoader.java:94)
at org.kitesdk.morphline.solr.LoadSolrBuilder$LoadSolr.doNotify(LoadSolrBuilder.java:104)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Connector.notify(Connector.java:57)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.AbstractCommand.doNotify(AbstractCommand.java:150)
at org.kitesdk.morphline.base.AbstractCommand.notify(AbstractCommand.java:132)
at org.kitesdk.morphline.base.Notifications.notify(Notifications.java:96)
at org.kitesdk.morphline.base.Notifications.notifyCommitTransaction(Notifications.java:61)
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.commitTransaction(MorphlineHandlerImpl.java:148)
at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:156)
... 3 more
Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=ixits-ux_job1_1464858313005] unknown field '_attachment_body'
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.doRequest(LBHttpSolrServer.java:340)
at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:301)
at org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:341)
at org.apache.solr.client.solrj.impl.CloudSolrServer$1.call(CloudSolrServer.java:338)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
====================================
If i am adding the field to solr schema.xml then it will get loaded to solr,but I do not want to load the extra fields to solr.
Please help me for the same.
Thanks.

Wolfgang Hoschek

unread,
Jun 2, 2016, 11:57:02 AM6/2/16
to ixit shah, CDK Development
You could use the setValues command with an empty list as argument, or the removeFields command, for example. See http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#setValues and http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#removeFields

--
You received this message because you are subscribed to the Google Groups "CDK Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.
For more options, visit https://groups.google.com/a/cloudera.org/d/optout.

ixit shah

unread,
Jun 3, 2016, 2:43:41 AM6/3/16
to CDK Development, ixitsh...@gmail.com
Hi,
Thanks for your reply.
Infact after posting only I saw your example codes and tried removeFields and added blacklist regex like below.And it worked.

Updated morphline.conf file
=======================================================================
{      removeFields {
 blacklist : ["regex:_attachment_.*"]
}
}
    


      # log the record at INFO level to SLF4J
      { logInfo { format : "output record: {}", args : ["@{}"] } }

      # load the record into a Solr server or MapReduce Reducer
     {
       loadSolr {
         solrLocator : ${SOLR_LOCATOR}
      }
    }
    ]
  }
]
=======================================================================
I will check and try setValues now.I will let you know if any issues are coming over there.
Also,could you please clear one doubt. Morphline will only work with solr schema.xml? or it will work with managed-schema as welll? As I am facing exception if i  am using data-driven-schema-configsets of solr.Even i renamed the managed-schema to schema.xml as well but it will not work and it is not recommended practice as well.
It would be helpful if u can throw some light on the same.
Thanks a lot.
-ixit shah
===========================
Reply all
Reply to author
Forward
0 new messages