Installing Judy for CentOS 8

41 views
Skip to first unread message

Tekevwe Kwakpovwe

unread,
Apr 16, 2021, 5:44:00 AM4/16/21
to NoSketch Engine

To whom it may concern,

I'm writing to ask about installing Judy on Red Hat Enterprise 8.3, it seems like to install it, there are dependencies that depend on each other. For example glibc-common requires ld-linux-aarch64.so.1()(64bit) and vice versa. These aren't the only two, there are several more. This is for a CentOS 8 (stream) installation. This is stopping me from installing Judy and subsequently installing Onion. Does anyone know of any work arounds for this or alternatives to Onion?

Kind regards,
Tekevwe.

Miloš Jakubíček

unread,
Apr 16, 2021, 10:03:06 AM4/16/21
to Tekevwe Kwakpovwe, NoSketch Engine
Hi Tekevwe,

Judy is no longer needed for Onion, but it might be that the released version is a bit outdated. I will have my colleagues check that.

All the best,
Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton, UK


--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.
To view this discussion on the web visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/8e0ae692-dbc6-45d6-aa72-49ad83a045e3n%40sketchengine.co.uk.

Vít Suchomel

unread,
Apr 16, 2021, 12:58:34 PM4/16/21
to Tekevwe Kwakpovwe, Miloš Jakubíček, NoSketch Engine
Dear Tekevwe,

thank you very much for reporting this!
Onion is using Google sparse hashset (https://github.com/sparsehash/sparsehash) instead of Judy now. I have just updated the page where you can find the most recent version 1.4: http://corpus.tools/wiki/Onion.

Kind regards,
Vit Suchomel
Sketch Engine team



On Fri, 16 Apr 2021 at 16:03, Miloš Jakubíček <milos.j...@sketchengine.eu> wrote:
Vitku,

brani neco vydani te verze z Onionu, co pouzivat google sparse?

zdar
m.

Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton, UK


Vladimír Benko

unread,
Apr 16, 2021, 2:37:14 PM4/16/21
to no...@sketchengine.co.uk

Vítku,

thank you very much for reporting this!
Onion is using Google sparse hashset (https://github.com/sparsehash/sparsehash) instead of Judy now. I have just updated the page where you can find the most recent version 1.4: http://corpus.tools/wiki/Onion.
are you sure that 1.4 is a newer version?  It does not seem to be any different from that of 1.2...

Best,

V, 20:25

--
Vladimír Benko

Slovak Academy of Sciences
Ľ. Štúr Institute of Linguistics
Panská 26, SK-81101 Bratislava

Tel +421-2-54431762 Fax -54431756

http://aranea.juls.savba.sk/guest/
https://www.facebook.com/araneawebcorpora/

Vít Suchomel

unread,
Apr 17, 2021, 3:16:05 AM4/17/21
to Vladimír Benko, NoSketch Engine
Hi Vlado,

thank you -- I have just uploaded the right files.

V.


--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.

Vladimír Benko

unread,
Apr 18, 2021, 7:54:10 AM4/18/21
to NoSketch Engine
Vítku,

I have just uploaded the right files.

I made a test of the new Onion last evening on with 4.3 Gigatoken corpus, making use of the fact that it has been dedupliced by the older version on Onion two days ago.  Here are the results:

Time

from

to

elapsed

seconds

Onion 1.4

19:27:25

23:01:45

3:34:20

12860

Onion 1.2

15:41:53

17:13:11

1:31:18

5478

ratio

 

 

 

2.35

                                                       

RAM

initial

total

Onion 1.4

4770

11641

Onion 1.2

3317

7975

ratio

1.44

1.46

I.e., (in this particulaar case) the Sparsehash version was more than two times slower and needed amlost 1.5 times more memory than that based on Judy.  As this might be an issue for very large corpora and "small" machines, I suggest that you provide a link to download the older Onion version as well, for those who could find it useful :-)

Miloš Jakubíček

unread,
Apr 18, 2021, 1:47:33 PM4/18/21
to Vladimír Benko, NoSketch Engine
That's quite weird. Vitek, could it be caused by some other changes?

The code for using google sparse is actually quite old (I did that in 2016). I remember (hopefully correctly) that it was requiring more memory but that it was faster, not slower, than Judy.
Which Judy version were you using? Could it be that the machine started swapping because of insufficient memory? What was the memory peak (grep VmPeak /proc/$PID/status)?

Best
Milos

Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton, UK

--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.

Vladimír Benko

unread,
Apr 18, 2021, 3:56:42 PM4/18/21
to no...@sketchengine.co.uk
Miloši,

That's quite weird. Vitek, could it be caused by some other changes?

The code for using google sparse is actually quite old (I did that in 2016). I remember (hopefully correctly) that it was requiring more memory but that it was faster, not slower, than Judy.
Which Judy version were you using? Could it be that the machine started swapping because of insufficient memory? What was the memory peak (grep VmPeak /proc/$PID/status)?

I doubt it could be caused by swapping -- no other activity was present at that machine at the same time...

I am going to perform another deduplication tonight and may compare results obtained by both versions in the morning.  I have noticed, however, that unlike 1.2 Onion 1.4 is being compiled by g++.  Could that make the difference?

Best,

Vlado B, 21:55

Miloš Jakubíček

unread,
Apr 18, 2021, 4:14:33 PM4/18/21
to Vladimír Benko, NoSketch Engine

I am going to perform another deduplication tonight and may compare results obtained by both versions in the morning.  I have noticed, however, that unlike 1.2 Onion 1.4 is being compiled by g++.  Could that make the difference?

Only if the some build flags are erroneously omitted -- did you compile with -O2?

Best
Milos 

Vladimír Benko

unread,
Apr 18, 2021, 4:25:15 PM4/18/21
to no...@sketchengine.co.uk
M,

Only if the some build flags are erroneously omitted -- did you compile with -O2?

I simply invoked the standard make command.  Looks like being compiled with -O3 in both cases.

V, 22:25

Vit Suchomel

unread,
Apr 19, 2021, 3:00:08 PM4/19/21
to Vladimír Benko, NoSketch Engine, Milos Jakubicek
Hi Vlado,

according to my performance evaluation on 2,5 billion English tokens and 1,7 billion Russian tokens (token counts after de-duplication) with parameters "-s -n 5 -t 0.9 -l 10", Onion with google sparse hash set (v. 1.4) consumes less memory at the cost of time in comparison with Judy Arrays (v 1.2), which is the desired trade-off:
onion_eval_memory.png
onion_eval_time.png

The old version of Onion can still be downloaded from the same address as before.

Best regards,
Vítek


--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.

Vladimír Benko

unread,
Apr 22, 2021, 11:44:53 AM4/22/21
to NoSketch Engine

Vítku,

my newest experiment gave results similar to that of yours (different machine, Slovak corpus, 6.59 Gigatoken before, and 4.99 Gigatoken after document-level deduplication.

time

from

to

elapsed

seconds

Onion 1.4 (Sparsehash)

8:57:07

11:32:10

2:35:03

9303

Onion 1.2 (Judy)

12:46:42

14:31:28

1:44:46

6286

ratio




1,48

 

RAM (MB)

initial

total

Onion 1.4 (Sparsehash)

9518

18637

Onion 1.2 (Judy)

18540

23279

ratio

0,51

0,80

I.e., the new Onion needed less memory at the price of processing time.  I'll try to do more comparisons with new corpora to be processed soon.

Best,

Vlado B, 17:45

Reply all
Reply to author
Forward
0 new messages