Subject: Word list and Keywords report twice as high frequencies in some cases
Date: 07/15/2024 2:08 pm
From: Ondřej Herman
To: Tomáš Machálek <tomas.m...@gmail.com>
Dear Tomas,
We resolved the issue, it was indeed caused by the handling of MUTIVALUE attributes within subcorpora. The fix will be available in the next release of Manatee.
In the meantime, you can use the attached patch.
Thank you for reporting the problem.
Best,Ondrej
YouTube tutorials: https://youtube.com/c/SketchEngineBoot Camp Online – a course in mastering Sketch Engine https://www.sketchengine.eu/bootcamp/
On Friday, June 7, 2024 at 5:53:04 PM, Tomáš Machálek wrote:Hello all,
I've prepared a small vertical file from one of our corpora plus a corresponding configuration file (PATH will probably be invalid in your environment).
I've tested it in both NoSkE and our KonText with the same results.
The simplest steps to replicate are as follows
1) select net_v2_sample corpus
2) create a subcorpus (e.g. with the condition s.doc_type="blog")
3) prepare the wordlist function
3.1) use "find: tags" starting with "N"
3.2) select the subcorpus above
4) perform the calculation (press "Go")
- the UI should report that it needs to prepare data
5) look at the results and try to compare the respective frequencies with the number of results when going to the corresponding concordances containing each frequency element.
If you have any questions, please let me know.
Best regards,
Tomas Machalek
On Tue, Jun 4, 2024 at 11:18 AM Tomáš Machálek <tomas.m...@gmail.com> wrote:
Thanks for the information. I will prepare a small corpus and send it to you (or a download link). Hopefully I can do this by the end of the week.
Thank you,Tomas Machalek
On Fri, May 31, 2024 at 3:00 PM Michal Cukr | Sketch Engine Support <sup...@sketchengine.eu> wrote:
Dear Tomáš,
Thank you for updating Bonito and providing more details about the issue.I have tried to raise the error according to your instructions, but unfortunately, I have not been able to reproduce it. I have discussed it with my colleagues and we would need a minimum reproducible example to inspect the error in more detail.In short, we see two ways how to get it:
- To create small data, e.g. 100-token corpus, causing the error and send them to us including the corpus configuration (registry) so that we can compile the data on our servers.
- Or to grant us access to the corpus where the mistake occurred.
If you wish to keep your data private, please share it with us via our standard support channel sup...@sketchengine.euBest regards,
Michal Cukr
YouTube tutorials: https://youtube.com/c/SketchEngineBoot Camp Online – a course in mastering Sketch Engine https://www.sketchengine.eu/bootcamp/
On Wednesday, May 15, 2024 at 11:47:11 AM, Tomáš Machálek wrote:
I have updated to bonito-open-5.71.15, crystal-open-2.166.4 (Manatee is already 2.225.8) and the problem persists.
My observation suggests that this is probably related to the fact that the attributes in question use multivalues. I've noticed that the values are far from always exactly twice the correct value. Rather, it depends on how often the multi-values occur within the given attribute. Just to be sure - this only applies to non-word attributes and subcorpora, in which case Bonito has to compute intermediate data.
Tomáš Machálek
On Tue, May 7, 2024 at 3:52 PM Michal Cukr | Sketch Engine Support <sup...@sketchengine.eu> wrote:
Dear Tomáš,Thank you for your email and the details you provided to my colleague František in the private email.We are not able to reproduce the issue you described in the current up-to-date version of Sketch Engine.However, I can see you are using an obsolete version of Bonito as well as Crystal. Please update these components and then try your query if the issue persists.Best regards,Michal Cukr
On Thursday, May 2, 2024 at 3:17:15 PM, Tomáš Machálek wrote: