JARO is the official journal of ARO. It is a peer-reviewed journal that publishes research findings focused on the auditory and vestibular systems. JARO welcomes submissions describing original experimental research that investigates the mechanisms underlying problems of basic or clinical significance.
Individual articles are submitted by spARO members (students/postdocs) to highlight on our website. If you wish to contribute a Research News article, please contact JARO at jaro....@aro.org for detailed instructions.
In our online shop we offer cds, books published by JARO and some third-party products for sale. You can easily buy quickly and we try to deliver within a few days. Deliveries are made worldwide, if your country does not appear in the country selection list or in case of order problems please contact us by mail to: vert...@jaro.de
Install in your Node project using npm as usual; npm install jaro-winkler. Italso works in the browser, just include the source within index.js in yourproject however you prefer. Note that the distance function will be added tothe global scope if it's not included with a tool like Browserify.
The first major improvement to performance came from the batch_jaro_winkler implementation by Dominik Bousquet. The idea behind this implementation is to create a lookup table for one of the datasets that has tuples of names and pointers to the records with that name. This lookup table is keyed on the letters which are present in the names. For example:
It would be extremely verbose for me to explain all of the small bitwise operations that are in use within the algorithm and would probably require an entire blog post of its own. If you are curious about the full implementation of the algorithm you can check out the code in our (relatively well-documented) repo on github.
In the end I was able to achieve a 10-15x speedup compared to the batch_jaro_winkler library, and a 40-50x speedup compared to common string comparison libraries. However there is one caveat: in some cases the algorithm over counts transpositions resulting in occasionally depressed scores from other implementations. This difference is relatively minor in the average error size is 0.002 or less when testing against names from the 1880 U.S. census. Our next steps with the algorithm will be to extend it to include computing Jaro Winkler scores on multiple names for the same record (first and last), which will be required before we integrate it into our larger linking pipeline.