Hi Jeremy,
I hope the following will enplane it better, but first of all I don't know yet if this violate any licences or not and I'm still looking if any.
Some Facts:-
1- TiddlyWiki has multiple "Data Sources" (Communities) around the internet (this GG, TW5 Dev GG, TW5 Doc GG, GitHub Discussions, Stack Overflow Questions, TW5 on Reddit, ...... etc)
2- TiddlyWiki is the BEST personal Information Management System we know/used (this is a FACT for me, but to prove it to others, at lease TW should manage its own information).
Now, in a few words; my goal is to EXPORT all TiddlyWiki data/info from those data sourced (by scraping/crawling) and IMPORT them into a unified TiddlyWiki System (converting data units into tiddlers) so we can start use/build over it.
In other words, let TiddlyWiki own its data that scattered around the Internet and build a "TW Information Portal" that show the EXTREME power of TW5 over its own data.
NOTE that this will NOT replace any of the original Data Sources, just to complements them and construct a "Portal" that we can build over it
So, at the end of the 1st phase , we'll have a single TiddlyWiki system (Node.js) contains something like the following (ALL the below data will be extracted from the source, NO Human Intervention at this phase):
Imaginary TW5 Google Group Tiddler (we'll have > 120K of them)
===================================
title: "GG TW5 ID pDlJDdWZNHQ"
tags: [[2021]] [[Conversation]] [[Message]] [[TiddlyWiki Google Group]]
custm-field-gg-title: "The History Show of the GG Community"
custm-field-gg-author: "Taacees"
custm-field-gg-date: "01 Aug 2021"
text: "Message Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question
Imaginary GitHub Disscussion Tiddler
=====================================
title: "GitHub Disscussions ID 5924"
tags: [[2021]] [[Disscussion]] [[TiddlyWiki GitHub Disscussions]]
custm-field-gethub-title: "Bitmap editor - should we use pointer events?"
custm-field-gethub-author: "BurningTreeC"
custm-field-gethub-date: "01 Aug 2021"
text: "Message Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question
Imaginary Reddit Quesion/Comment Tiddler
=========================================
title: "Reddit Question ID onx6qn"
tages: [[2021]] [[Reddit]] [[Question]]
custm-field-reddit-title: "Newbie Question: Editing a field in a template"
custm-field-reddit-author: "u/OneDiscombobulated83"
custm-field-reddit-date: "01 Aug 2021"
text: "Question Body"
..... Any other info in separated "Custom Fields"
AND a separated tiddler for each reply with links to main Question
Imaginary Stack Overflow Quesion/Answer Tiddler
===============================================
title: "StackOverflow Answer ID 34693482"
tages: [[2016]] [[Stack Overflow]] [[Reply]] [[Answer]]
custom-field-stackoverflow-title: tiddlywiki: "can't save changes in QWebView"
custom-field-stackoverflow-date: "9 Jan 2016"
text: "Question Body"
..... All other info in separated "Custom Fields"
AND separated tiddler for each answers with links to main Question
I hope this make the idea more clear, but It'll be more visible after showing some example "tid" files.
And, regarding the Tools, I'm developing those scrapers/crawlers using the following golang libraries:
Regards