Report of the J wiki meeting of April 25th, 2024

10 views
Skip to first unread message

robert therriault

unread,
May 2, 2024, 12:11:58 AMMay 2
to fo...@jsoftware.com
== Report of Meeting 2024-04-25 ==

Present: Art Anger, Ed Gottsman, Raul Miller, Devon McCormick and Bob Therriault

Full transcripts of this meeting are now available on the its wiki page. https://code.jsoftware.com/wiki/Wiki/Report_of_Meeting_2024-04-25

1) Ed gave a report on dendrograms as a way to display information and has decided to move on to using tree maps as a way of displaying the tags. He provided a working prototype although he warned that it was in early stages of development. Ed said that he's not a fan of tree maps and wonders if this is going to be useful, but will pursue a bit further to see if there is promise. Raul suggested smaller boxes to allow space for the category names. Bob suggested the APL approach of including the shape in the border of the frame, except in this case it might be the name of the category. Ed says that a lot of these visualizations look cool, but he feels that the information value is not as good as you would hope when the cool factor wears off. Devon feels that an outline format is more compelling and Ed agrees. Bob talked about the Pixar effect of having to protect ideas in the early stages before they were given full critique. Ed feels that the tree map may be best to analyze whether the tree is balanced or if the structure is appropriate for displaying information to the user.

2) Bob showed some preliminary work that he had done on the template to display the category trees at the bottom of each page. Earlier attempts to incorporate SVG had failed because MediWiki turns SVG's into PNG's, but this takes away the ability to scale and the interactivity. The option of SVG would still be available for the J Viewer because JQt can display SVG. Ed pointed out that his tree maps were done in isidraw with JQt, J and some low level graphics calls. Bob explained that his naming of the links can shorten the category page identifier. Th other functionality that Bob is exploring is the has: pseudo class in CSS which allows action at distance that may allow information to be displayed in a separate area when a category is hovered over. Ed felt that this is a summary detail interaction and feels that has some promise. Bob is now working on the display to show the hierarchical structure. Ed thinks that the CSS of grid display can show the hierarchy automatically with grids within grids. Bob thinks that the work that is done in the template is worthwhile because it can be used in multiple areas and would not need to be changed too often. Ed reiterated his offer to help with JavaScript because that would certainly work within the wiki. Raul felt that a lot of the resistance in MediaWiki could be the result of legacy code that has grown organically over the years.

3) Jan Jacobs had sent an email with some preliminary categories suggested by his data mining. Ed says that the categories Bob has created have more semantic meaning at this points, but Bob thinks that this is early days. Jan's email:

Hi Bob, Ed,
since last time's meeting I have pursued 2 directions for compressing the large (sparse) document space that is spun up by all the terms in the Jwiki. Compression is necessary to keep SOM training times down to acceptable limits.
The 2 directions are:
using the word embeddings of GloVe (100D)
using the random mapping approach used in the WebSOM experiment
Both directions demand for an adaptation of the hierarchical clustering approach because I didn't support compression 'well' enough.

I started with GloVe and this time with more terms (15K in total) in order to come up with a lot of them to look up in the GloVe embedded vectors array. Although I managed this time to keep all wiki pages in, however the number of looked up terms is very low for some pages. This again shows that some pages lack enough *standard* terms. Nevertheless I have a base map representing the 'bottom' of the tree with 118 clusters. I included the descriptive words of all clusters, with the contained page numbers (and implicit count) as appendix to this mail. The number of descriptive words (characterising and discriminating ones) is to be specified.
At this point a warning should be given: word embeddings (~text AI) do code for some semantics in a text and one could derive *in theory* better descriptive words on all intermediate hierarchical levels (direction1) than a pure statistical approach (direction2). When going for 'statistics' you will stick with words as they occur in the data set. So no clever exchange of e.g. terms England, Wales, Scotland, ... by United Kingdom as a text AI simply could do. But one, could of course do this by yourself when given enough terms that describe hierarchical clusters!

I will now continue with direction2. This also allows for including the Jenglish terms. When I have the base map I will provide you with a similar csv as below and we could discuss the construction of the tree in an agglomerative way.
Jan.

Ed feels that the technology that is available to Jan is the limiting factor of what he can find. A server cluster on Amazon might solve that problem, but that might be an option later in the process. There are certainly many possibilities with this.

4) Ed asked Raul if the forking of processes issue had been solved with JQt. Raul had not seen any resolution to this issue. Ed said he was happy to explore tree maps for now and when those issues were resolved, Ed will return to the stand alone J Wiki Viewer in the future.

5) Raul showed an interface that he had developed based on trace to be able to follow the execution of J verbs as an educational tool. Raul is working on a bug that seems resistant to the debug stack facilities in J. Ed observed that this will feed an independent interface that will allow the user to understand the processes. Ed had thought of using a scroll wheel driven odometer model of going through a sequence of operations. Bob said that Marshall Lochbaum had once told him that a good illustration is much better than an animation for viewer understanding. Ed agreed that animations are often used by those who have not had the time to do a proper illustration.

For access to previous meeting reports https://code.jsoftware.com/wiki/Wiki_Development If you would like to participate in the development of the J wiki please contact us on the J forum and we will get you an invitation to the next J wiki meeting held on Thursdays at 23:00 (UTC) Next meeting is May 2, 2024.
Reply all
Reply to author
Forward
0 new messages