Use of Morphline file to process XML data

58 views

Skip to first unread message

Aniruddh

unread,

Jun 7, 2016, 1:46:17 PM6/7/16

to CDK Development

I am new to use Cloudera Search. I have following queries

a) when I use post.jar to test Solr Collection. It only requires me to create schema.xml for Solr where I define a simple schema. it does not require me to create morphline file. But when I use MapReduceIndexer then it is required to create morrphline file. So my query is if I had a very simple csv file on which I need to create index . Why MapReduceIndexer requires a morphline file to create. The reason I ask this query is in next query.

b) When I look at Solr documentation, then it can directly accept xml file as input. But cluster has Kerberos so I have to use MapReduceIndexer to create index on xml input file. and Morphline file does not have a direct xml read method like readCSV. So I have to use xquery/xslt. Is there any way not to use morphline file and create index using MapReduceIndexer in accordance with defined schema.xml. If no, then how morphline file helps me , when I use xslt then am I supposed to flatten the xml file and write output as a flattened record to loadSolr.

c) also input xml that I have does have records with nested fields. What is best development practice in terms of how combination of morphline and schema.xml in solr should be written ?

Thanks

Aniruddh

Reply all

Reply to author

Forward

0 new messages