response to get-models differs between 4.0 and 3.4.1

10 views
Skip to first unread message

Felix Lohmeier

unread,
Jul 11, 2021, 5:01:04 PM7/11/21
to OpenRefine Development
Hi Antonin,

I tested if the openrefine-client might already work with the 4.0 branch and found that the response to the get-models command has changed.

OpenRefine 3.4.1
{
  "columnModel": {
    "columns": [
      {
        "cellIndex": 0,
        "originalName": "a",
        "constraints": "{}",
        "type": "",
        "format": "default",
        "title": "",
        "description": "",

        "name": "a"
      },
      {
        "cellIndex": 1,
        "originalName": "b",
        "constraints": "{}",
        "type": "",
        "format": "default",
        "title": "",
        "description": "",
        "name": "b"
      },
      {
        "cellIndex": 2,
        "originalName": "c",
        "constraints": "{}",
        "type": "",
        "format": "default",
        "title": "",
        "description": "",
        "name": "c"
      }
    ],
    "columnGroups": [],
    "keyColumnName": "a",
    "keyCellIndex": 0
  },
(...)

OpenRefine 4.0-snapshot
{
  "columnModel": {
    "columns": [
      {
        "originalName": "a",
        "name": "a"
      },
      {
        "originalName": "b",
        "name": "b"
      },
      {
        "originalName": "c",
        "name": "c"
      }
    ],
    "keyCellIndex": 0,
    "keyColumnName": "a"
  },
(...)

Paul's Python client expects the "cellIndex" information (see https://github.com/opencultureconsulting/openrefine-client/blob/2735db3f3fb06812010d430611a731217e85e200/google/refine/refine.py#L416). Should I check how the client works without cellIndex or do you intend to change that (e.g. for backward compatibility)?

The video was very impressive! Thanks a lot for it.

Best wishes,
Felix

Antonin Delpeuch (lists)

unread,
Jul 12, 2021, 11:00:19 AM7/12/21
to openref...@googlegroups.com

Hi Felix,

Yes indeed, that's one of the breaking changes. So far I do not know if it makes sense to write a "Migration guide" for extensions like I did for previous breaking changes (JSON handling, CSRF tokens…), because there are really a lot of them in the Java API. I could do it for the HTTP API (where there are much less changes) but we have always stated that this HTTP API was not intended for public use, so it would probably send out confusing messages.

Anyway, I can still answer your question here.

Yes the `cellIndex` field was dropped, because it is no longer useful in the new architecture (because reordering columns is done lazily, there is no big cost associated to it as it was previously the case in the in-memory set-up). So, the index of a column is simply its index in the list of columns returned by the API (so the first one has cellIndex 0, the second 1, and so on).

I also took this opportunity to remove the unused fields which you spotted below - they had been introduced in the Data Package integration but had never been exposed in the UI.

Happy to answer more questions like this if you have any :)

Antonin

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/1297a38a-5d26-4d90-9b6e-630411b728afn%40googlegroups.com.

Felix Lohmeier

unread,
Jul 12, 2021, 5:28:29 PM7/12/21
to OpenRefine Development
Hey Antonin,

On Monday, 12 July 2021 at 17:00:19 UTC+2 Antonin Delpeuch (lists) wrote:

I could do it for the HTTP API (where there are much less changes) but we have always stated that this HTTP API was not intended for public use, so it would probably send out confusing messages.

 I think a short summary of the HTTP API changes here on the developer mailing list would be very helpful for others as well.

Yes the `cellIndex` field was dropped, because it is no longer useful in the new architecture (because reordering columns is done lazily, there is no big cost associated to it as it was previously the case in the in-memory set-up). So, the index of a column is simply its index in the list of columns returned by the API (so the first one has cellIndex 0, the second 1, and so on).

Thanks for the detailed explanation. This was a simple fix. It looks like the openrefine-client is now ready for OpenRefine 4.0. Yay!

Happy to answer more questions like this if you have any :)

 Did you drop support for Java 8? When running the 4.0 branch with java-1.8.0-openjdk the import fails with an endless "updating preview" spinner and console error "java.lang.NoSuchMethodError: java.util.Optional.isEmpty()Z".

Otherwise, the client's test suite is still throwing a few bugs, but I guess those functions just aren't ready yet?
  • import option character encoding (seems to have no effect at the moment)
  • import line-based text files
  • export to xls, xlsx, ods
The 4.0 branch is surprisingly ready. I'm totally looking forward to using it in the next project and see how far I get with it. Congratulations again!

Thad Guidry

unread,
Jul 13, 2021, 12:21:31 PM7/13/21
to openref...@googlegroups.com
Felix,

If you know SQL and have a Spark cluster, you can also put ETL scripts together and run data quality checks as well using this:
Apache Griffin also let's you do some DQI.



Reply all
Reply to author
Forward
0 new messages