from xml file through javascript to json format and after flatten it to a new file

324 views
Skip to first unread message

sxbdo...@gmail.com

unread,
Mar 29, 2018, 2:50:42 AM3/29/18
to sdc-user
Hi guys.
I'm trying to read a complex xml file/message and send that to a javascript evaluator, where I have loaded a java class with an XMLSerializer.

This class is working fine when I call the method to convert the incoming string to JSON from a java application.

I want to get my class return and send it ahead to a field flattener. 

How could I do that? I've tried different ways last 3 days with no success.

Regards,

Douglas.

Pat Patterson

unread,
Mar 29, 2018, 12:10:33 PM3/29/18
to sxbdo...@gmail.com, sdc-user
Hi Douglas,

Can you show us some of your code and what you've tried?

One thought - if you have JSON in your evaluator, you could serialize it to a string, put it in a field, then use the JSON Parser processor to parse it into a hierarchy of fields that the Field Flattener could then process.

Otherwise, just do the field flattening in the evaluator?

Cheers,

Pat

--

Pat Patterson | Technical Director | http://about.me/patpatterson

--
You received this message because you are subscribed to the Google Groups "sdc-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sdc-user+unsubscribe@streamsets.com.
Visit this group at https://groups.google.com/a/streamsets.com/group/sdc-user/.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Douglas Romero da Cruz

unread,
Mar 29, 2018, 2:39:06 PM3/29/18
to Pat Patterson, sdc-user
Hi Pat.

Thanks reply me. You are being helpful!


I'm using these jar's dependencies:
     commons-beanutils-1.9.3
     commons-collections-3.2.1
     commons-lang-2.5
     commons-logging-1.2
     ezmorph-1.0.6
     json-lib-2.4-jdk15
     xom-1.2.5

My converte class is in converter.java file.
My example app is in xtj.java file.

I also have added the pipeline json (xml_jscripts.json),  XML (x.xml) file and the result (y.json). A beautiful JSON clean and with no namespaces.

I believe I'm trying to do a hard stuff but I also believe it will be helpful for other sdc users.

Regards,

Douglas.

Pat Patterson

unread,
Mar 29, 2018, 11:43:37 PM3/29/18
to sxbdo...@gmail.com, sdc-user
Hi Douglas,

I'm a bit confused. Are you sharing your working version, or still asking for help?

Cheers,

Pat

--

Pat Patterson | Technical Director | http://about.me/patpatterson

On Thu, Mar 29, 2018 at 9:53 AM, <sxbdo...@gmail.com> wrote:
Hi Pat.
Thanks reply me. You are being helpful!

I'm using these jar's dependencies:
     commons-beanutils-1.9.3.jar
     commons-collections-3.2.1.jar
     commons-lang-2.5.jar
     commons-logging-1.2.jar
     ezmorph-1.0.6.jar
     json-lib-2.4-jdk15.jar
     xom-1.2.5.jar

Below my xml-json class and my test project.

I also have added the XML file and the result. I beautiful JSON clean and with no namespaces.

I believe I'm trying to do a hard stuff but I also believe it will be helpful for other sdc users.

============== JAVA CLASS ================
package com.redfront;

import net.sf.json.JSON;
import net.sf.json.xml.XMLSerializer;

public class converter {

public String xmltojson (String xml) {
        
XMLSerializer serializer = new XMLSerializer();
        serializer.setSkipNamespaces(true);
        serializer.setRemoveNamespacePrefixFromElements(true);
        serializer.setTrimSpaces(true);
        
        xml = xml.replaceAll(">\\s*<", "><");

        JSON json = serializer.read(xml);
        
return json.toString();
}
}

============== JAVA CLASS ================

============== JAVA EXAMPLE =============

package xmljson.example;


import java.io.IOException;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Paths;

import com.redfront.converter;



public class FileToStringJava8 {

public static void main(String args[]) throws IOException {


        String fileString = new String(Files.readAllBytes(Paths.get("/tmp/x.xml")), StandardCharsets.UTF_8);

        System.out.println(fileString);

        

        converter c = new converter();

        String json = c.xmltojson(fileString);


        

        System.out.println(json);

        


    }

}

============== JAVA EXAMPLE =============

Regards,

Douglas.

Douglas Romero da Cruz

unread,
Mar 30, 2018, 1:12:59 AM3/30/18
to Pat Patterson, sdc-user
Sorry my English. I’ve shared what I’ve done to your better understand.
I still need help.

Regards ,

Douglas


From: Pat Patterson <p...@streamsets.com>
Sent: Thursday, March 29, 2018 8:43:34 PM
To: sxbdo...@gmail.com
Cc: sdc-user
Subject: Re: [sdc-user] Re: from xml file through javascript to json format and after flatten it to a new file
 

Pat Patterson

unread,
Mar 30, 2018, 10:37:05 AM3/30/18
to Douglas Romero da Cruz, sdc-user
Hi Douglas,

I understand. I'll take a look when I can.

Cheers,

Pat

--

Pat Patterson | Technical Director | http://about.me/patpatterson

Pat Patterson

unread,
Apr 3, 2018, 1:17:00 PM4/3/18
to Douglas Romero da Cruz, sdc-user
Hi Douglas,

The key here is to use the XML data format in the Directory origin to parse the input file, then it's relatively straightforward to manipulate the tree of nodes. I'll explain the steps...

Note - there seems to be some RTF formatting at the start and end of the XML file you sent, as well as backslashes at the end of each line:

$ head /Users/pat/Downloads/x.xml 
{\rtf1\ansi\ansicpg1252\cocoartf1561\cocoasubrtf200
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0


I removed these manually before I started. Not sure if this came from some text editor. I've attached the XML I used.

Here's my pipeline:




I configured the Directory origin with XML data format and a maximum record length longer than the input file, since the entire file will end up as a single record:


Using preview, I could see that the XML was parsed correctly, but there are fields in the root for the namespaces, and the other fields have namespace prefixes that we don't want. Due to the way that XML is processed, there is also a lot of structure that we don't want:



I used the JavaScript processor with the following script to rename keys and remove the redundant layers:

// Remove the namespace prefix from the key.
// Remove redundant layers from the value,
// recursing down the tree.
function processNode(key, value) {
  // Remove the namespace prefix from the key
  var newkey = key.replace(/^ns\d+:/g,"");
  // The sample data has only one element with a given name
  // at that layer of the tree, but there may be more in the
  // general case. Create a list, then remove the redundant 
  // layer if there's only one element.
  var newvalues = [];
  for (var index in value) {
    var element = value[index];
    if (element.value) {
      // This is a leaf - use the value
      newvalues.push(element.value);
    } else {
      // This is a node - recurse down the tree
      var newmap = {};
      for (var k in element) {
        newnode = processNode(k, element[k]);
        newmap[newnode.key] = newnode.value;      
      }
      newvalues.push(newmap);
    }
  }
  if (newvalues.length == 1) {
    newvalues = newvalues[0];
  }
  return {key: newkey, value: newvalues};
}

// Iterate through the records in the batch
for(var i = 0; i < records.length; i++) {
  var record = records[i];
  try {
    var newmap = {};
    for (var key in record.value) {
      // Skip namespace fields in root
      if (! key.startsWith('ns|xmlns:')) {
        newnode = processNode(key, record.value[key]);
        newmap[newnode.key] = newnode.value;
      }
    }
    record.value = newmap;

    // Write record to processor output
    output.write(record);
  } catch (e) {
    // Send record to error
    error.write(records[i], e);
  }
}

Preview shows that things look good:




With a Local FS destination set to JSON data format, this resulted in the desired output (file attached):

$ cat /tmp/out/2018-04-03-17/sdc-24e75fba-bd00-42fd-80c3-1f591e200ca6_55f4139c-3011-4fbd-9c5e-d7afd23e2253 | jq .

{

  "entity-PolicyChange": {

    "ExceptionHandlingType_Ext": "NotApplicable_Ext"

  },

  "CloseDate": "2017-08-08T17:02:55.936-07:00",

  "CreateTime": "2017-08-08T17:02:38.301-07:00",

  "CreateUser": {

    "Contact": {

      "AddressBookUID": "pcuser:N7SB",

      "FirstName": "FFF-NNN001",

      "LastName": "LLL-NNN001",

      "PublicID": "FROT:7911"

    },

    "Credential": {

      "UserName": "N7SB"

    },

    "PublicID": "FROT:7568"

  },




Cheers,

Pat

--

Pat Patterson | Technical Director | http://about.me/patpatterson

x.xml
y.json

Douglas Romero da Cruz

unread,
Apr 3, 2018, 4:21:48 PM4/3/18
to Pat Patterson, sdc-user
Hi Pat.
It works fine. 
Graceful in the truth.

Thank you!

Reply all
Reply to author
Forward
0 new messages