Refactored lc-reconcile (LCNAF) experiment (WIP)

9 views
Skip to first unread message

Tom Morris

unread,
Nov 15, 2022, 1:26:27 PM11/15/22
to openref...@googlegroups.com
This is a functional, but not heavily tested, experiment at this point, but might be of interest to developers who are looking for a Python reconciliation service starting point.

I took a pass through Christine Harlow's LCNAF reconciliation service and added/fixed a number of things with the intent that perhaps it could be used as the basis of a Python reconciliation service framework (after stripping out the LCNAF specific stuff). I also added a few LCNAF-specific things like the request header that LC requests, support for the Suggest2 service, etc.

It isn't at all clear to me which set of LoC APIs would provide the best results and they provide very little in the way of helpful guidance, so if anyone is familiar enough with these APIs to provide feedback, I'd be happy to tune the implementation to be more useful. As it stands now, I'm underwhelmed with the reconciliation results.


Here's a list of the stuff that I did:
  • Add support for Suggest2 service
  • honor limit parameter from the reconciliation request
  • added HTTP request header (as requested by LoC) with operator's name (now required on the command line)
  • added HTTP retries (3) with backoff (0.5, 1, 2 seconds)
  • added timeouts to HTTP gets so that the server can't hang
  • Add metrics for latency, caching effectiveness, total requests, responses by status code
  • remove Python 2 support
  • switch to Conda for dependencies
  • remove unused dependencies
  • refactor to minimize redundant code
  • switch to more liberally licensed (plus more modern and faster) rapdifuzz library

TODO:
  • Add more Python type hints
  • add rate limiting
  • figure out which of the suggest/suggest2/didyoumean services are most useful (HELP WANTED!)

Antonin Delpeuch (lists)

unread,
Nov 25, 2022, 4:31:04 AM11/25/22
to openref...@googlegroups.com
Hi Tom,

Very nice! If I remember correctly, Christine was keen to pass on the
maintenance of this tool, so she might be up for transferring the
original repository to you, or something similar. In any case it would
be worth making your fork really visible!

Best,
Antonin

On 15/11/2022 19:26, Tom Morris wrote:
> This is a functional, but not heavily tested, experiment at this point,
> but might be of interest to developers who are looking for a Python
> reconciliation service starting point.
>
> I took a pass through Christine Harlow's LCNAF reconciliation service
> and added/fixed a number of things with the intent that perhaps it could
> be used as the basis of a Python reconciliation service framework (after
> stripping out the LCNAF specific stuff). I also added a few
> LCNAF-specific things like the request header that LC requests, support
> for the Suggest2 service, etc.
>
> It isn't at all clear to me which set of LoC APIs would provide the best
> results and they provide very little in the way of helpful guidance, so
> if anyone is familiar enough with these APIs to provide feedback, I'd be
> happy to tune the implementation to be more useful. As it stands now,
> I'm underwhelmed with the reconciliation results.
>
> The branch is available here:
> https://github.com/tfmorris/lc-reconcile/tree/modernize
> <https://github.com/tfmorris/lc-reconcile/tree/modernize>
>
> Here's a list of the stuff that I did:
>
> * Add support for Suggest2 service
> * honor limit parameter from the reconciliation request
> * added HTTP request header (as requested by LoC) with operator's name
> (now required on the command line)
> * added HTTP retries (3) with backoff (0.5, 1, 2 seconds)
> * added timeouts to HTTP gets so that the server can't hang
> * Add metrics for latency, caching effectiveness, total requests,
> responses by status code
> * remove Python 2 support
> * switch to Conda for dependencies
> * remove unused dependencies
> * refactor to minimize redundant code
> * switch to more liberally licensed (plus more modern and faster)
> rapdifuzz library
>
>
> TODO:
>
> * Add more Python type hints
> * add rate limiting
> * figure out which of the suggest/suggest2/didyoumean services are
> most useful (HELP WANTED!)
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine-de...@googlegroups.com
> <mailto:openrefine-de...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEEt7vx79XxQS3Xr%2B%3DJZvkvNYz2UwEqGDn_C8DqVUwmDtw%40mail.gmail.com <https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEEt7vx79XxQS3Xr%2B%3DJZvkvNYz2UwEqGDn_C8DqVUwmDtw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages