Presenting GG2TW Beta, a method to convert Google Groups Conversations to TW JSON Tids

177 views
Skip to first unread message

Finn Lancaster

unread,
Jun 12, 2021, 10:35:03 PM6/12/21
to TiddlyWiki
Hello All,

Recently, I have been inspired by many complaining about GG, and have thus built a [beta] version of a GG to TW converter. While still full of bugs and in need of much more work and polishing, you can still get a gist of how it will work when fully completed. 
A demo is available at my website subdomain gg2tw.finnsoftware.net, and the project has a GitHub at https://github.com/flancast90/GG2TW/

What I will be adding: For those of you that have tested, you can see that the text of the tiddler includes needless strings such as "Forward", "Reply", and more from the text-grabbing from GG: I will fix this soon when I have time. In addition to this, I have just thrown together the UI in 5 minutes, so it will be upgraded, too. 
Furthermore, it will soon be able to handle larger downloads, and a way for you to select which page you would like to get it from.

Note for those running it locally bc of download: The API in use to bypass CORS is quite finicky, and will throw an error. To fix this (only needed if downloaded and not run via given subdomain) , just clear all your browser history and cache: The API will reset the CORS access Header.

I would appreciate to hear some of your questions, comments, and concerns, and I will answer them ASAP. Feel free to Star, Fork, Download, or Open an Issue at the project's GitHub:  https://github.com/flancast90/GG2TW/

Mat

unread,
Jun 13, 2021, 1:08:21 PM6/13/21
to TiddlyWiki
Oh! This sounds really cool but I don't get the demo to show anything. Does this scrape the gg pages? Do you have any thoughts on how the tiddlers should/could be stored?
<:-)

TiddlyTweeter

unread,
Jun 13, 2021, 1:42:35 PM6/13/21
to TiddlyWiki
Ciao flanc...

The idea sounds neat! Unfortunately nothing happens for me :-(. I get this but no results appear on click ... :-( ...


Screenshot 2021-06-13 194045.jpg

Best wishes
TT

Finn Lancaster

unread,
Jun 13, 2021, 4:37:35 PM6/13/21
to TiddlyWiki
Strange, it works for me. I added a loading message telling of how many conversations were successfully loaded. If it stays at 0/90, then it is truly an issue. Possibly it was just taking a long time (takes around a minute generally to fully finish, I find.) Let me know if it helps you! 

Another idea could be to clear history, cookies, etc. for all time, and try again. If this works, it is an issue with the CORS proxy API I am using. 

To address Mat, yes, it does scrape the GG FIRST page (1-90 conversations, but some repeats). The output is an array of valid JSON data, which is recognized by TW as JSON tiddlers, and TW will then import them (drag and drop works best).

strikke...@gmail.com

unread,
Jun 13, 2021, 7:36:17 PM6/13/21
to TiddlyWiki
I did get 90/90 conversations - it really did not take that long.I exported the json and tried to drag and drop it into my tiddlywiki. It worked - but totally unreadable.

Still it is exciting news. What a good idea you had.

Birthe

Finn Lancaster

unread,
Jun 13, 2021, 8:39:23 PM6/13/21
to TiddlyWiki
Huh. Mine always glitches-out before TW successfully imports, so I've never seen the actual output in TW. I'd always assumed it would have no issues, since I've run the code through various JSON-checking sites, and it has been labeled valid. Were the words legible with just random chars in between, or was it just a jumble?

Finn Lancaster

unread,
Jun 13, 2021, 8:49:01 PM6/13/21
to TiddlyWiki
Yep, just got the same thing. The issue seems to be that TW doesn't auto URIdecode imported text, which seems quite a bit... obtuse. I'll push a quick-fix tonight or tomorrow 

strikke...@gmail.com

unread,
Jun 13, 2021, 8:59:51 PM6/13/21
to TiddlyWiki
Random chars as you also found. But that was not the only thing. 90 of 90 conversations ended up in 28 imported tiddlers.

The good news is that it kind of worked - as I am using a low end 12 year old laptop with 8 gb ram.

Birthe

Finn Lancaster

unread,
Jun 13, 2021, 9:50:36 PM6/13/21
to TiddlyWiki
Yes. The 90 into 28 is taking into account responses to the conversations, which will lower the count. GG just likes to make things difficult, so it split them all up. Since the loading thing was not planned for at first, I just connected it to my counter variable that is added to after every successful request, hence the 90. In the code, there is an if statement with a comment "code goes here." In the future, this part will deal with the repeats.

Finn Lancaster

unread,
Jun 14, 2021, 9:00:02 PM6/14/21
to TiddlyWiki
Fixed the illegible text issue! I added a few replace() and got rid of the method to escape I was using. Just tested the TW import, and works perfectly! Of course, this could all change if there happens to be a character that the program doesn't escape somewhere in the GG. However, I deem this rare, as I don't think there are any chars that I've forgot. Keep me informed of any issues; I intend to make the UI better soon. 

P.S.: I have encountered a crash in Chrome upon running the program. If anyone else has the issue, it is the program. So far, however, I'm thinking it's just my PC, as Chrome doesn't seem to like my outdated Linux version :)

Mat

unread,
Jun 15, 2021, 5:02:04 AM6/15/21
to TiddlyWiki
Very cool!
How do you want feedback on this? If the intention really is a usable "GG to TW" then this is a big project and perhaps it'd make sense with discussion on gh?

Is there possibility to tiddlify individual posts and e.g tag them with the thread title? ..which also means the mechanism would need to invent some kind of title for the posts.

And do you have any thoughts on how it would all be stored? I understand if the actual storing is not part of your project but maybe you have some thoughts about how it had best be done? 

<:-)

Mohammad Rahmani

unread,
Jun 15, 2021, 8:17:21 AM6/15/21
to tiddl...@googlegroups.com
Great job! Really useful!

I am thinking how we can automatically convert code sections into code blocks and tagify posts?


Keep going on!



Best wishes
Mohammad


--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/41327824-83f2-4785-a226-92611c784ba7n%40googlegroups.com.

Finn Lancaster

unread,
Jun 15, 2021, 8:27:16 AM6/15/21
to TiddlyWiki
Mat and Mohammad:
Love the ideas! My main concern right now is supporting more conversations (only converts 30 as of right now), or allowing the user to select the number they would like to convert. Auto-formatting might be difficult, since everything is stored in JSON. However, I am wondering if TW will automatically format them if the symbols are replaced by valid TW syntax. 

Mat, I'm a bit confused as to what you mean by "thread title." Do you mean the titles it is automatically giving already, or something else?

Tags will be easy, and I had planned on adding them, as well as authors. These can be made with new fields in the JSON!

Lastly, storing these files could prove difficult. One such way would be to create a program on an unused computer, and having it run the GG2TW program at a certain interval and stimulate a download of the data, where another way (more difficult), would listen to the GG conversation and run only when 30 new posts were added. This idea is worthy of its own repo!

Thanks!
     Finn Lancaster 

BTW, there is already a GitHub conversation going on in the TW5 discussions. A link to it is: https://github.com/Jermolene/TiddlyWiki5/discussions/5793

Mat

unread,
Jun 15, 2021, 1:06:33 PM6/15/21
to TiddlyWiki
Mat, I'm a bit confused as to what you mean by "thread title." Do you mean the titles it is automatically giving already, or something else?

Thread just refers to the whole conversation. Thread title in this current case is "Presenting GG2TW Beta, a ....". So my point was to have each post in this thread be a tiddler because often an interesting solution is found in a specific post rather than the whole thread. But so each such post should maybe be tagged with the thread title.

Of course, thread titles are not unique which is a matter that has to be solved. One idea would be to concatenate the initial posts date to the title of the thread. Another might be some hash solution. (See other discussions about how to make unique titles, UUID etc. It's doable.)

<:-)

Finn Lancaster

unread,
Jun 15, 2021, 1:19:41 PM6/15/21
to tiddl...@googlegroups.com
Ok. I get what you’re saying now. I thought about such an approach, but doing so would exponentially increase the loading speed of the tool. I figure that it’s best to leave each thread as a tiddler. 
Imagine if the entire GG were converted. Instead of maybe 20,000 threads/tiddlers, it could have over 100,000. I imagine this could lead to even further issues in TW itself. 

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.

Mat

unread,
Jun 15, 2021, 1:56:51 PM6/15/21
to TiddlyWiki
Ok. I get what you’re saying now. I thought about such an approach, but doing so would exponentially increase the loading speed of the tool. I figure that it’s best to leave each thread as a tiddler. 
Imagine if the entire GG were converted. Instead of maybe 20,000 threads/tiddlers, it could have over 100,000. I imagine this could lead to even further issues in TW itself. 

OK, so the idea with your project is to convert the entire GG archive then? Actually, is the purpose to make a searchable archive? The discussion you refer to in your initial post is about replacing GG with something, in other words a "discussion solution", is this somehow something you have in mind? Or is this more a PoC that threads can be scraped and tiddlified? (which, of course, also is super cool and a probably a needed component in a more encompassing solution :-)

<:-)

Finn Lancaster

unread,
Jun 15, 2021, 2:02:47 PM6/15/21
to tiddl...@googlegroups.com
It’s a bit of both, to be honest. I’d like to expand this to be able to export GG little-by-little to TW, until the point where GG isn’t needed anymore. 

--
You received this message because you are subscribed to a topic in the Google Groups "TiddlyWiki" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tiddlywiki/5F-upqEOIKY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/df145bc7-ab11-4729-aec8-5dd7047f3f18n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages