json file never completely imported

46 views
Skip to first unread message

opnoob

unread,
Dec 9, 2016, 7:08:53 AM12/9/16
to OpenRefine
Greetings, i upload a json file, i always give the path,  i have increased the memory openrefine may use to 4gb, but i always seem to get a percentage of the records i have imported, even when i specify the number of rows, to be bigger than the ones i insert. What could be the cause of this?

Owen Stephens

unread,
Dec 9, 2016, 8:02:11 AM12/9/16
to OpenRefine
The possibilities that occur to me are:

1) The records you aren't seeing are not in the path you are selecting from the JSON
2) The data is importing but the way the data works means that in OpenRefine they aren't correctly split into the same number of rows/records you are expecting (due to the way OpenRefine does Records this can be an issue)
3) Something is causing multiple records to be inserted in a single cell (I've seen this with a csv import where some specific characters seem to cause issues with parsing the csv, but I've not seen it with JSON before)

Is there any data sample you can share that illustrates the problem?

Owen

Owen Stephens

unread,
Dec 9, 2016, 10:43:07 AM12/9/16
to OpenRefine
After some off-list correspondence we've now identified the issue as a tab character appearing inside a JSON value - this silently truncates the data imported.  If you try to import the following JSON (with a tab character before 'B') then you can see it fails to import anything after the A record.

{
  "rows": [
    {
      "Content": "A"
    },
    {
      "Content": " B"
    },
    {
      "Content": "C"
    }
  ]
}

Since tab is an illegal character here (should be escaped as a \t) then it makes sense the import fails - but it feels like failing silently isn't very intuitive. I'm not sure what the preferred behaviour is here - truncate the import but with an error message, or fail the entire import because of the illegal character?

Any views?
 (I'll add an issue at https://github.com/OpenRefine/OpenRefine/issues but would to get feedback on the preferred behaviour)

Owen

Thad Guidry

unread,
Dec 9, 2016, 1:28:23 PM12/9/16
to OpenRefine
Yes, we need to escape that Tab.  Its a feature in Jackson JSON Parser that we should be using and exposing to the user.
Comments added to our issue.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages