wrangler - xml parsing error (xml to json)

121 views
Skip to first unread message

Leona Chen

unread,
Nov 23, 2021, 7:33:04 AM11/23/21
to CDAP User
Hi, 
I am trying to use Wrangler xml to json function to parse a xml file to build a pipeline, the file is enclosed for your reference. however, the pipeline kept failing with a following error message (attached also). I have been looking up on the internet about possible cause of the error, maybe it is related to encloding with BOM or without BOM, but i am not sure what is the wrong with my xml. Appreciate any help on this matter.

thanks
cdap_query.docx
cdap_query.xml

Sebastian Echegaray

unread,
Nov 23, 2021, 12:14:41 PM11/23/21
to cdap...@googlegroups.com
Hello,

Unfortunately, I was not able to reproduce the error. I loaded the shared XML into Wrangler, converted the XML into JSON, then parsed all JSON fields, and finally made it into a batch pipeline with a GCS source, and ran it with no issues.
image.png

Which source are you using to load the XML into the pipeline? If it's being loaded as a blob, then it has to be projected to a string before setting it as an input to wrangler.
Could you share your pipeline.json please?

Thank you,
-Sebastian

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/19b29022-8057-41e0-bb11-5ec3230d0ebdn%40googlegroups.com.

Leona Chen

unread,
Nov 23, 2021, 12:57:27 PM11/23/21
to CDAP User
Hi Sebastian,

i used a file as an input but i jsut checked, my file is indeed showing a blob,  and I did not use projection as you suggested.

are you able to show me how you set up your projection properties as a reference if I were to turn blob into string?

[
    {
        "name": "etlSchemaBody",
        "schema": {
            "type": "record",
            "name": "blob",
            "fields": [
                {
                    "name": "body",
                    "type": "bytes"
                }
            ]
        }
    }
]

Sebastian Echegaray

unread,
Nov 23, 2021, 1:43:04 PM11/23/21
to cdap...@googlegroups.com
Hello,

Here's how I set up the Projection transform:
image.png

Or in the JSON file:
{
"name": "Projection",
"plugin": {
"name": "Projection",
"type": "transform",
"label": "Projection",
"artifact": {
"name": "core-plugins",
"version": "2.5.0",
"scope": "SYSTEM"
},
"properties": {
"convert": "body:string"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": "{\"type\":\"record\",\"name\":\"etlSchemaBody.typeprojected\",\"fields\":[{\"name\":\"body\",\"type\":\"string\"}]}"
}
],
"inputSchema": [
{
"name": "GCS",
"schema": "{\"name\":\"etlSchemaBody\",\"type\":\"record\",\"fields\":[{\"name\":\"body\",\"type\":\"bytes\"}]}"
}
],
"id": "Projection"
},

Thank you,
-Sebastian

Leona Chen

unread,
Nov 25, 2021, 7:09:32 AM11/25/21
to CDAP User
Hi Sabastian,

your configuration suggestion worked for me now. Thank you so much!!!

Reply all
Reply to author
Forward
0 new messages