‎The History Show of the GG Community: Travel 17 years back in time in just 3 Hours

115 views
Skip to first unread message

Taacees ME

unread,
Aug 1, 2021, 12:06:31 AM8/1/21
to TiddlyWiki
Hi Everyone,

Following the Jeremy's advice, Kindly accept the invitation to celebrate that our GG community has reached the count of 25K conversations (almost 😉).
Get your popcorn, soda, favorite audio tracks and enjoy the 16 Years age of our Community (try to forget any cons of the "Google Group" 😉)


Regards

Jeremy Ruston

unread,
Aug 1, 2021, 10:41:26 AM8/1/21
to tiddl...@googlegroups.com
Hi Taacees

Thank you! The project looks intriguing, could you explain a little more? From the GitHub repo I understand that you’re trying to mine the Google Group archive for useful information, but I’d like to understand more about your goals and the techniques you are using,

Best wishes

Jeremy.


--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/11512e17-2eab-48b5-b7a5-6ce46681e6bcn%40googlegroups.com.

Taacees ME

unread,
Aug 1, 2021, 1:59:52 PM8/1/21
to TiddlyWiki
Hi Jeremy,

I hope the following will enplane it better, but first of all I don't know yet if this violate any licences or not and I'm still looking if any.

Some Facts:-
1- TiddlyWiki has multiple "Data Sources" (Communities) around the internet (this GG, TW5 Dev GG, TW5 Doc GG, GitHub Discussions, Stack Overflow Questions, TW5 on Reddit, ...... etc)
2- TiddlyWiki is the BEST personal Information Management System we know/used (this is a FACT for me, but to prove it to others, at lease TW should manage its own information).

Now, in a few words; my goal is to EXPORT all TiddlyWiki data/info from those data sourced (by scraping/crawling) and IMPORT them into a unified TiddlyWiki System (converting data units into tiddlers) so we can start use/build over it.
In other words, let TiddlyWiki own its data that scattered around the Internet and build a "TW Information Portal" that show the EXTREME power of TW5 over its own data.
NOTE that this will NOT replace any of the original Data Sources, just to complements them and construct a "Portal" that we can build over it

So, at the end of the 1st phase , we'll have a single TiddlyWiki system (Node.js) contains something like the following (ALL the below data will be extracted from the source, NO Human Intervention at this phase):

Imaginary TW5 Google Group Tiddler (we'll have > 120K of them)
===================================
title: "GG TW5 ID pDlJDdWZNHQ"
tags: [[2021]] [[Conversation]] [[Message]] [[TiddlyWiki Google Group]]
custm-field-gg-title: "The History Show of the GG Community"
custm-field-gg-author: "Taacees"
custm-field-gg-date: "01 Aug 2021"
text: "Message Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question

Imaginary GitHub Disscussion Tiddler
=====================================
title: "GitHub Disscussions ID 5924"
tags: [[2021]] [[Disscussion]] [[TiddlyWiki GitHub Disscussions]]
custm-field-gethub-title: "Bitmap editor - should we use pointer events?"
custm-field-gethub-author:  "BurningTreeC"
custm-field-gethub-date: "01 Aug 2021"
text: "Message Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question

Imaginary Reddit Quesion/Comment Tiddler
=========================================
title: "Reddit Question ID onx6qn"
tages: [[2021]] [[Reddit]] [[Question]]
custm-field-reddit-title: "Newbie Question: Editing a field in a template"
custm-field-reddit-author: "u/OneDiscombobulated83"
custm-field-reddit-date: "01 Aug 2021"
text: "Question Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question

Imaginary Stack Overflow Quesion/Answer Tiddler
===============================================
title: "StackOverflow Answer ID 34693482"
tages: [[2016]] [[Stack Overflow]] [[Reply]] [[Answer]]
custom-field-stackoverflow-title: tiddlywiki: "can't save changes in QWebView"
custom-field-stackoverflow-date: "9 Jan 2016"
text: "Question Body"
..... All other info in separated "Custom Fields"
AND separated tiddler for each answers with links to main Question

I hope this make the idea more clear, but It'll be more visible after showing some example "tid" files.

And, regarding the Tools, I'm developing those scrapers/crawlers using the following golang libraries:
Regards

Taacees ME

unread,
Aug 17, 2021, 10:49:30 AM8/17/21
to TiddlyWiki
Hi Jeremy, All,

We've finished the "Phase 1" of this Project, and here are some samples of the result: https://bit.ly/37ELoj1 (in 2 formats: single TW5 file & "*.tid" files)
Also I created a video https://youtu.be/FuJR-5uSDVU trying to explain both the above results and the project itself.

P.S.,
1- The Project Name has been changed to "510K" after creating the above video, cause I'm really interesting to see how TW5 will handle that number of tiddlers
2- "Phase 1" is work only with GG Conversations, Messages will be in "Phase 2"

Thanks
Reply all
Reply to author
Forward
0 new messages