Failed at using Langdetect package using Jython

64 views
Skip to first unread message

Julien Vallières

unread,
Feb 4, 2021, 1:30:11 PM2/4/21
to OpenRefine
Hello geniuses of Openrefine! I’ve been lurking here and there for some time. These last days, I’ve had to detect the language of 5000 texts. I’ve found Sylvain use of the Python package Langdetect in the Expression window (Add a column based on column). Unfortunately, this is easier said than done, and I’ve spent the day try to make it working, without success.

I’ve made 2-3 screenshots, and I’ve copied the text from the report, in the right side of the Expression window. Hopping someone here would have an idea how to resolve this particular problem.

==
Error: Traceback (most recent call last):
  File "<string>", line 5, in __temp_1301653213__
  File "/Applications/OpenRefine.app/Contents/Resources/webapp/extensions/jython/module/MOD-INF/lib/jython/langdetect/__init__.py", line 1, in <module>
    from .detector_factory import DetectorFactory, PROFILES_DIRECTORY, detect, detect_langs
  File "/Applications/OpenRefine.app/Contents/Resources/webapp/extensions/jython/module/MOD-INF/lib/jython/langdetect/detector_factory.py", line 10, in <module>
    from .detector import Detector
  File "/Applications/OpenRefine.app/Contents/Resources/webapp/extensions/jython/module/MOD-INF/lib/jython/langdetect/detector.py", line 5, in <module>
    from six.moves import zip, xrange
ImportError: cannot import name zip
==

Note that I’ve drop the packages in the right folder, under Jython2.7.2, but that Openrefine has only given some signs of doing something when I’ve put a copy in one of its own subfolders.

I’m using Openrefine 3.4.1 on the latest version of Big Sur.

Capture d’écran, le 2021-02-04 à 19.01.37.pngCapture d’écran, le 2021-02-04 à 18.22.26.png

Owen Stephens

unread,
Feb 4, 2021, 5:35:04 PM2/4/21
to OpenRefine
It looks like another package "six" is required https://pypi.org/project/six/1.9.0/ (apparently it's a general package for improved compatibility between Python 2.x and 3.x). I've just installed Jython via Homebrew on macOS High Sierra and then used pip to install langdetect and it has dropped in six-1.15.0 as well
 -------

owen$ brew install jython
==> Pouring jython-2.7.2.high_sierra.bottle.tar.gz
🍺  /usr/local/Cellar/jython/2.7.2: 3,911 files, 120.2MB

owen$ jython -m pip install langdetect
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.python.core.io.StreamIO (file:/usr/local/Cellar/jython/2.7.2/libexec/jython.jar) to field java.io.FilterOutputStream.out
WARNING: Please consider reporting this to the maintainers of org.python.core.io.StreamIO
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
DEPRECATION: A future version of pip will drop support for Python 2.7.
Collecting langdetect
     |████████████████████████████████| 993kB 501kB/s
Collecting six (from langdetect)
Installing collected packages: six, langdetect
Successfully installed langdetect-1.0.8 six-1.15.0
-------

Then as per the instructions from Sylvain I used the expression:
import sys
sys.path.append('/usr/local/Cellar/jython/2.7.2/libexec/Lib/site-packages')
from langdetect import detect
return detect(value)

This worked OK - so I think if you can get six installed in the right location it should all work OK.

Owen

Tom Morris

unread,
Feb 4, 2021, 5:43:30 PM2/4/21
to openr...@googlegroups.com
For Mac OS X, the path separators are forward slashes ("/"), not backslashes ("\"), and I'm guessing that you probably want an absolute path which begins "/Users/"

It looks, however, like the packaging changes between 2.7.1 and 2.7.2 might require some adjustment in the setup.

Here's a formula that worked for me:

import sys
p = '/Users/tfmorris/jython2.7.2//Lib/site-packages'
if p not in sys.path:
  sys.path.insert(0,p)

from langdetect import detect

return detect(value)
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/d155fd72-d10b-4b85-8236-b813d4c3a8b2n%40googlegroups.com.

Julien Vallières

unread,
Feb 5, 2021, 6:15:11 AM2/5/21
to OpenRefine
Owen, thank you for the answer. I noticed yesterday about Six package mention, and I did put a freshly dowloaded version of it in the folder, and it did help, the traceback got longer as a matter of fact, but it still did not worked, stumbling on that ImportError. The error message I’ve reproduced here is the result.

Tom, thank you for the answer too. I’ve corrected the path, following your indications, but the error message I get remains the same.

Seeing you guys having it work in a heartbeat makes me wonder what might I have done wrongly...

I have uninstalled and reinstalled the packages, just to see, but nothing changed.

Julien



Owen Stephens

unread,
Feb 5, 2021, 8:49:16 AM2/5/21
to OpenRefine
Hi Julien,

One thing I didn't mention but should have - I run the Linux version even though I work on a Mac. This isn't something officially supported, but it's always worked for me. It might be worth a try to see if that works?

Owen

Tom Morris

unread,
Feb 5, 2021, 11:48:47 AM2/5/21
to openr...@googlegroups.com
One thing you might want to try is restarting OpenRefine. I think there are pieces of the Jython implementation which only get initialized once and if you end up with a corrupted sys.path, it might take a restart to fix it.

As a matter of fact, the prepend vs append change might be a red herring, due to a transitory problem that got cleared by a restart during my testing.

The conditionalization is a tidiness / performance thing. Without it, the path will continually grow longer and longer because it gets appended/prepended to each time an evaluation is done.

I was running a development build on my Mac. I can retest with the Mac kit, but I don't suspect that it's a factor here.

Tom

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Julien Vallières

unread,
Feb 5, 2021, 4:16:05 PM2/5/21
to OpenRefine
Owen, I just tried to run the Linux version, and had no problem running OpenRefine, but the error message stay the same, unfortunately.

Tom, I did (and just redid) restarted Openrefine. In fact I have rebooted my Mac, the error message stay the same.

Could it be a compatibility issue? Are you running the latest version of Openrefine on Big Sur? I remember, last summer, a plug-in I had installed in a previous version of Openrefine could not be installed on 3.3. I guess both of you are, it’s just that I am at lost on why this is not working on my particular computer.

The last part of the error message reads:
from six.moves import zip, xrange
ImportError: cannot import name zip

I might try another package, to see if it is specific to this configuration Langdetect/Six.

Julien

Owen Stephens

unread,
Feb 12, 2021, 5:09:49 AM2/12/21
to OpenRefine
Hi Julien,

Apologies for the delayed response. When I got this working I was running OpenRefine 3.4.1 on High Sierra - but I now have a Big Sur installation I can test on when I have a chance.

Did you get anywhere trying another package?

Owen
Reply all
Reply to author
Forward
0 new messages