--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/588ee6e7-a44f-402a-a930-50fe385abd02n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaOrJ2Mw4QwiPy9x3OWLoGCJgzS1qkBs3f18hjU88C-m6A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAG0T8-X5w6xH%2BkXSVmXYPhhNFAFGNjtme4y0BQ5NwzU6yUh8gw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaOEod%2BSZAgyzs8nPQvLWTMmGZ31s9wHq4NBrmi%2BjifhcQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAG0T8-Wg_Wg%2BazWqrAZgPyFPVKf7k_z4Q8%2BuAd9wc_7qBG0sFw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAG0T8-XDtQs4YkiduhZxtCU0d1tSiZp8%2B5ihDHoumSpm37Ba3A%40mail.gmail.com.
The clustering rows dialog display uses Javascript and if there are many rows of clusters then that might be detected by the browser as being very compute heavy to display and update.There is not much you can do about that at this time other than work with smaller cluster sets, and preferably, to use Facets to filter down the sets of rows PRIOR to using the clustering dialog.
We do hope to improve all of that in future versions of OpenRefine, but this likely won't appear until next year, perhaps the following year.Until that time, just work through smaller batches via Facets, Filtering, etc.
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAE9vqEH11k5udu8rqp5_6hNGRYutPdY5j%3DMJVHY_0Qb8bC9Vsw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaO1XBLrMHRvfOi6%3D5nUcng0j%2BpxcOHe9FNdeRG8jcU%2B6g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAG0T8-X-9504i7JPf60U9of7%2BPPnrr-CQOF%3DTjXxabm04EvWeQ%40mail.gmail.com.
Hi Fran,
Yes we are working on making the backend handle large datasets better. I haven't looked at your issue closely but since Tom mentioned that it was purely a frontend-side performance issue, that is unrelated to these improvements.
Best,
Antonin
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CADUu2To-BUn1%2BkSn7gsUepy8WZKQhbxQshOR9zv6%3D2Az5v%2BOKA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/aa7cb3f7-5cf1-c4f8-0075-f1ada657f8cd%40antonin.delpeuch.eu.
Le 6 oct. 2020 à 20:09, Thad Guidry <thadg...@gmail.com> a écrit :You might also want to understand the differences between edit distance algorithms and token based algorithms.
For English strings, I typically use Jaro-Winkler algorithm, which helps with typos, from the edit distance type, and then Sorenson-Dice algorithm that is a token based type to find large domain concept similarities since it overestimates. Here's halfway decent primer on what I mean https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227