After increasing the error threshold, it complains about later columns. Question 1: How can I view the columns of my JSON again? The tab for my JSON file disappeared from the Data Preparation screen (now it has only a CSV file I'm wrangling in a Batch Pipeline). And clicking on the Wrangle button in the Wrangler Properties shows no data, just prompts to "select or upload a file" even though I already did this before. Is it no longer visible because I set the File Path (in the File Source that feeds the Wrangler) to be a directory instead of a single file? Question 2: How can I "Use JsonReader.setLenient(true) to accept malformed JSON," please? |
co.cask.wrangler.Wrangler#263 | Error threshold reached '1' : com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 14 |
I created a copy of my JSON file (same data, different file name) and went through Data Preparation again with less Directives, so resulting in a slightly different output schema. After Deploying & Running on that same file, it remained in the Data Preparation, so that's good. But I still get the same error:
Error threshold reached '1' : com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 14
This time I can see what "column 14" is in Data Preparation, and it's a JSON field that is not present in the first record but is in some later records. During Data Preparation it just has it as null/empty in records like this, which is what I want. But when Deploying & Running, it apparently doesn't like that the field doesn't exist (at least not in the first record). FYI: The Null check box is checked for this in the Output Schema, its Type is String, and I don't see anything invalid about the field name or value when it appears in later records. What can be done to get the Wrangler to work on sparse data like this, please?
In the File Properties Configuration, enter this in the File System Properties:
{
"textinputformat.record.delimiter": "`"