Latent Dirichlet Allocation in unsupervised Form

28 views
Skip to first unread message

Valerio jus

unread,
Oct 21, 2022, 6:05:40 PM10/21/22
to The ADAMS Flow User mailing list
Hi Peter,

I hope you are well. 

Is it possible to apply Latent Dirichlet Allocation (unsupervised form) in ADAMS? If yes, then where to find it in ADAMS? If not, I would highly appreciate it if you could suggest a platform to apply it. 

Thank you in advance. 

Kind regards, 
Valerio 

Peter Reutemann

unread,
Oct 21, 2022, 10:47:29 PM10/21/22
to theadams...@googlegroups.com
Have you tried this Weka package?

https://sourceforge.net/p/weka-lda-filter/wiki/Home/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174 (office)
+64 (7) 577-5304 (home office)
http://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Oct 22, 2022 11:05:43 Valerio jus <valer...@gmail.com>:

--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/theadamsflow-user/bc3c284e-829c-40e3-800e-46422433f9ban%40googlegroups.com.

Valerio jus

unread,
Oct 22, 2022, 11:02:51 AM10/22/22
to theadams...@googlegroups.com
Thank you, Peter, for the prompt reply. 

The package is not available in the package manager. When I download the zip file obtained from the provided link, it shows an error message that prevents from having the package installed.

Any advice would be highly appreciated. 

Kind regards, 
Valerio

Peter Reutemann

unread,
Oct 24, 2022, 8:22:46 PM10/24/22
to theadams...@googlegroups.com
> The package is not available in the package manager. When I download the zip file obtained from the provided link, it shows an error message that prevents from having the package installed.

Yes, it is not an official Weka package. That was one of the first
hits when doing a little internet search (disclaimer: I've never used
LDA).

Looks like the zip file contains another directory level, which makes
it not a valid Weka package.

I just uncompressed it and then compressed the contents of the
innermost LDA directory into a new zip file. Even though that zip
still created an error when installing with Weka's package manager,
the filter showed up and could be used.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
https://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Valerio jus

unread,
Oct 24, 2022, 8:47:31 PM10/24/22
to theadams...@googlegroups.com
Thank you Peter. Kindly where can I find the zip file that you created?

Cheers, 
Valerio

On Sat, Oct 22, 2022 at 3:47 AM Peter Reutemann <frac...@gmail.com> wrote:

Peter Reutemann

unread,
Oct 24, 2022, 9:03:02 PM10/24/22
to theadams...@googlegroups.com
It isn't available anywhere (and the list doesn't allow posting it).

You can recreate it yourself, that's why I included the instructions.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174 (office)
+64 (7) 577-5304 (home office)
http://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Oct 25, 2022 13:47:32 Valerio jus <valer...@gmail.com>:

Valerio jus

unread,
Oct 24, 2022, 9:23:40 PM10/24/22
to theadams...@googlegroups.com
No worries. Will try it. 

Thank you. 

Cheers,
Valerio 

Valerio jus

unread,
Oct 25, 2022, 5:04:06 PM10/25/22
to theadams...@googlegroups.com
Hi Peter, 

Following your steps, I create LDA zip file. After uploading it from WEKA package manager, I can't find it in WEKA > FILTERS > UNSUPERVISED > ATTRIBUTE. Knowing that the LDA package exists now available in wekafiles\packages. Any help would be highly appreciated. 

Thank you. 

Kind regards, 
Valerio

Peter Reutemann

unread,
Oct 25, 2022, 6:59:20 PM10/25/22
to theadams...@googlegroups.com
> Following your steps, I create LDA zip file. After uploading it from WEKA package manager, I can't find it in WEKA > FILTERS > UNSUPERVISED > ATTRIBUTE. Knowing that the LDA package exists now available in wekafiles\packages. Any help would be highly appreciated.

You need to use the Package manager that comes with ADAMS (under the
Tools menu) not the one from a separate Weka installation. ADAMS
installs Weka packages separately (below
$HOME/.adams/wekafiles/VERSION or
%SERPROFILE%\_adams\wekafiles\VERSION).

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
https://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Valerio jus

unread,
Oct 25, 2022, 7:59:56 PM10/25/22
to theadams...@googlegroups.com
Hi Peter. 

Thank you for the prompt reply. 

I followed your advice but did not find it and I had this error.:

java.lang.NullPointerException
at weka.core.WekaPackageLibIsolatingClassLoader.checkForNativeLibs(WekaPackageLibIsolatingClassLoader.java:220)
at weka.core.WekaPackageLibIsolatingClassLoader.init(WekaPackageLibIsolatingClassLoader.java:145)
at weka.core.WekaPackageLibIsolatingClassLoader.<init>(WekaPackageLibIsolatingClassLoader.java:126)
at weka.core.WekaPackageClassLoaderManager.addPackageToClassLoader(WekaPackageClassLoaderManager.java:369)
at weka.core.WekaPackageManager.initializeAndLoadUnofficialPackage(WekaPackageManager.java:2372)
at weka.core.WekaPackageManager.installPackageFromArchive(WekaPackageManager.java:2359)
at weka.gui.PackageManager$UnofficialInstallTask.doInBackground(PackageManager.java:780)
at weka.gui.PackageManager$UnofficialInstallTask.doInBackground(PackageManager.java:724)
at javax.swing.SwingWorker$1.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at javax.swing.SwingWorker.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


Here is the link to the LDA package I created:

Help me please, Peter. 

Thank you. 

Valerio

--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.

Peter Reutemann

unread,
Oct 25, 2022, 8:10:38 PM10/25/22
to theadams...@googlegroups.com
> I followed your advice but did not find it and I had this error.:
>
> java.lang.NullPointerException
> at weka.core.WekaPackageLibIsolatingClassLoader.checkForNativeLibs(WekaPackageLibIsolatingClassLoader.java:220)
> at weka.core.WekaPackageLibIsolatingClassLoader.init(WekaPackageLibIsolatingClassLoader.java:145)
> at weka.core.WekaPackageLibIsolatingClassLoader.<init>(WekaPackageLibIsolatingClassLoader.java:126)
> at weka.core.WekaPackageClassLoaderManager.addPackageToClassLoader(WekaPackageClassLoaderManager.java:369)
> at weka.core.WekaPackageManager.initializeAndLoadUnofficialPackage(WekaPackageManager.java:2372)
> at weka.core.WekaPackageManager.installPackageFromArchive(WekaPackageManager.java:2359)
> at weka.gui.PackageManager$UnofficialInstallTask.doInBackground(PackageManager.java:780)
> at weka.gui.PackageManager$UnofficialInstallTask.doInBackground(PackageManager.java:724)
> at javax.swing.SwingWorker$1.call(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at javax.swing.SwingWorker.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
>
>
> Here is the link to the LDA package I created:
> https://drive.google.com/file/d/1bxKzVkYoFXIJFUg4WHjJ8Ubp66CUfgPO/view?usp=sharing

That zip file still has a top-level "lda" directory (see screenshot),
which makes it an invalid Weka package. Remove that directory level.
lda.png

Valerio jus

unread,
Oct 25, 2022, 8:31:23 PM10/25/22
to theadams...@googlegroups.com
Thank you, Peter. 

When I uncompress the zip file, I have the content that is shown in the attached file. Kindly what I suppose to do next?

Thank you.

Valerio



--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
LDA content.png

Peter Reutemann

unread,
Oct 25, 2022, 8:38:13 PM10/25/22
to theadams...@googlegroups.com
> When I uncompress the zip file, I have the content that is shown in the attached file. Kindly what I suppose to do next?

From inside the "lda" directory, generate the zip file.

If the built-in Windows zip functionality screws you over by
automatically inserting a "lda" level, then you might want to look at
other tools like 7-zip (https://7-zip.org/).
The DoubleCommander (https://doublecmd.sourceforge.io/) allows you to
create zip files as well.

Disclaimer: I'm not a Windows user.

Valerio jus

unread,
Oct 25, 2022, 9:17:11 PM10/25/22
to theadams...@googlegroups.com
Yessssssssssssssss! 

Finally, LDA is now available, please refer to the attachment.

Thank you so much, Peter, for your effective help. DoubleCommander works well for me. 

Have a nice day. 

Cheers, 
Valerio

--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
LDA is ready.png

Valerio jus

unread,
May 1, 2023, 11:40:59 PM5/1/23
to theadams...@googlegroups.com
Dear Peter, 

I hope you are doing well. 

I am trying to do topic modeling with LAD, so I invoked it on the ReutersCorn-test set and attached here its output (please see the attachment). 

After running the algorithm, it deletes the actual text, this makes it difficult to link the actual text with the LDA output. 

Kindly from the attached LDA output, do you how to determine the best number of topics for this corpus? What is the best way to interpret the attached result?

If you are not familiar with this LDA algorithm, could you please suggest any other way to do topic modeling in an easy and straightforward manner?

Any help would be highly appreciated. 

Thank you.

Kind regards, 
Valerio


LDA result on ReutersCorn-test.data.arff

Peter Reutemann

unread,
May 2, 2023, 12:21:53 AM5/2/23
to theadams...@googlegroups.com
> I am trying to do topic modeling with LAD, so I invoked it on the ReutersCorn-test set and attached here its output (please see the attachment).
>
> After running the algorithm, it deletes the actual text, this makes it difficult to link the actual text with the LDA output.
>
> Kindly from the attached LDA output, do you how to determine the best number of topics for this corpus? What is the best way to interpret the attached result?
>
> If you are not familiar with this LDA algorithm, could you please suggest any other way to do topic modeling in an easy and straightforward manner?

If you want the text alongside the LDA output, you can use
PartitionedMultiFilter meta-filter. It allows you to define filters
with associated attributes to operate on.
In your case, you would use two filters:
1. AllFilter, just on the text attribute (range: "first")
2. LDA filter on all the attributes (range: "first-last")

No, I'm not familiar with LDA, as I don't do anything in the NLP space.

BTW The filter uses LDAbride and Mallet under the hood, which seem to
output the topics in the console.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, Hamilton, NZ
Mobile +64 22 190 2375
https://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Valerio jus

unread,
May 2, 2023, 5:48:04 AM5/2/23
to theadams...@googlegroups.com
Thank you for the prompt response.

Kindly is there any way to compute the perplexity using any of ADAMS tools?

Thank you once again.

Cheers,
Valerio 

--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.

Peter Reutemann

unread,
May 2, 2023, 5:57:13 AM5/2/23
to theadams...@googlegroups.com
> Kindly is there any way to compute the perplexity using any of ADAMS tools?

I presume you mean complexity as in big-O notation? Then no, as this
is very specific to algorithms/methods.

Valerio jus

unread,
May 2, 2023, 6:05:52 AM5/2/23
to theadams...@googlegroups.com
Thank you for clarifying that. Your cooperation is much appreciated. 

Cheers,
Valerio



--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages