HTML cleaner / sanitizer

383 views
Skip to first unread message

Paul Netsaver

unread,
Nov 2, 2017, 8:26:51 AM11/2/17
to TiddlyWiki
Hi,
I'm currently using TW for importing web pages, snippets, etc (but I'ld like using it also as a CMS). 
I'm searching a plugin for reducing and cleaning the original HTML source code, something customizable, similar for instance to sanitizer.js.
Basically I need to:
- exclude all style attributes (except perhaps img size attributes)
- exclude all empty divs
- filter tags, attributes, classes based on whitelist / blacklist
Maybe something acting all over the tiddler content and activated/deactivated by a proper toolbar button.
Does a similar plugin for TW exist?
It is possible to create it by using an available js library?
Thanks and regards
Paul Netsaver 
Rome (IT)

@TiddlyTweeter

unread,
Nov 2, 2017, 10:20:32 AM11/2/17
to TiddlyWiki
You might want to look at BJ's Flexity plugin. This will let you use regular expressions to clean code. You'd need to display output as plain text and then copy and paste back to a new tiddler though as its NOT an import mechanism. But, as its name suggests, it has great flex as a general tool to change a Tiddler anyway you want.

http://flexibility.tiddlyspot.com/

Best wishes
Josiah

Mark S.

unread,
Nov 2, 2017, 10:48:25 AM11/2/17
to TiddlyWiki
You can also use use BJ's tiddlyclip with HTML to markup adapter. This uses regexp to remove almost everything that is irrelevant to TW renders.

I guess it depends on whether the objective is to make more compact text or simply safer text.

Mark

BJ

unread,
Nov 4, 2017, 3:29:18 PM11/4/17
to TiddlyWiki
In tiddlyclip for firefox, using the @web allow html to be clipped and it is sanatized. However with firefox 57 I don't think there is an  api for the sanztier, so I am planning on added a lib to do the sanitizing.

Mark S.

unread,
Nov 4, 2017, 4:35:29 PM11/4/17
to TiddlyWiki
Well, changing it to markup should certainly sanitize it. Practically sterilize it.

Will there be a tiddlyclip update in time for FF57? (10 more days, if they've stuck to the timetable).

Thanks!
Mark

BJ

unread,
Nov 4, 2017, 5:27:58 PM11/4/17
to TiddlyWiki
I don't see the point (in general) of converting to markup. Clips can be edited with ckeditor, to highlight the main points of interest.

tiddlyclip for ff57 is mostly working, so I expect that it will be ready in time.

@TiddlyTweeter

unread,
Nov 4, 2017, 6:04:36 PM11/4/17
to TiddlyWiki
Right. The whole point is to get something done. 

BJ wrote:

Mark S.

unread,
Nov 4, 2017, 6:05:43 PM11/4/17
to TiddlyWiki
Hmm. I wasn't aware of your visual editor. Looks interesting. I'm surprised it hasn't been chatted up more.

Lot's of text that you import is jam-packed with unnecessary mark-up. This bloats the size of the resulting TW.  The bigger TW gets the slower it runs until eventually it barely runs at all. You can hit this wall very quickly if you send your TW over to an under-powered tablet or phone. The beauty of mark-up is that you can easily read it either rendered or not rendered. Not so much with HTML.

Mark

BJ

unread,
Nov 4, 2017, 7:09:21 PM11/4/17
to TiddlyWiki
of course the point is to have the tools to enable what you want to do

@TiddlyTweeter

unread,
Nov 6, 2017, 5:28:23 AM11/6/17
to TiddlyWiki
Ciao Mark S. & Paul

FWIW, both CKEditor (that BJ made a plugin for to be able to run natively in TW) and TinyMCE WYSIWYG HTML editors have tools to reduce HTML bloat. Both also have cleaning import for Word documents.

Best wishes
Josiah

@TiddlyTweeter

unread,
Nov 6, 2017, 5:47:32 AM11/6/17
to TiddlyWiki
Mark S. wrote to BJ:
Hmm. I wasn't aware of your visual editor. Looks interesting. I'm surprised it hasn't been chatted up more.

Ciao Mark

I noticed a few times you commented recently about stuff that is either hard to find or other stuff you were not aware of. To me that is a very interesting signal. You seem pretty much on, if not ahead of the ball. So if you did not know a useful plugin how the hell would anyone else? :-)

I think that the inevitably fragmentary process of both finding and remembering things here that are good is a serious issue. No doubt you seen me crap on about it to the point of excess :-). But I'm sure there has to be a better "centralised" way to find stuff rather than "stab-in-the-dark" Google searches.

Jeremy in some previous comments reminded that tiddlywiki.com could be used more for that IF the stuff is preped for it and he is alerted. Maybe that is part of the issue. That that is not occurring? Maybe lack of an explicit methodology?

Erwan some time ago created a mechanism to auto-harvest links to flagged items, like plugins & elaborate macros, from individual TiddlyWiki for the "Community" thing. I don't think its used much.

Summary: There is a missing piece.

Best wishes
Josiah

Mark S.

unread,
Nov 6, 2017, 11:48:59 AM11/6/17
to TiddlyWiki
There are lots of projects in TW, and it's not feasible to be cognizant of them all. But I just thought a project offering WYSWIG would be getting lot's of discussion. When TW5 was first announced, I assumed that the next generation of TW would have WYSIWYG abilities. It was disappointing to find that instead we have another mark-up that doesn't conform to any other markup -- including TWC. Of course, it may be that there is too much overhead with WYSWIG.

It turns out there is a reference to BJ's project, but it's obscure and under the community references section. If you think to plug in WYSWIG into the search box at TiddlyWiki.com then BJ's will come up in the "matches" section. But if you just type in "editor" it will be swamped by other entries. What's needed is an entry with "WYSIWYG" and "Visual" in the title. Perhaps a duplicate entry with the discovery terms in the title.

You can't make things any easier for Jeremy than submitting a PR, which I have done on multiple occasions.  Unfortunately, what happens is that rather than just getting a spot-check and a quick approval, it gets side-lined by individuals who have taken no prior interest in documentation. Sure, I get it that you don't want factual or spelling or grammatical errors, but that should be the extent of review. And then, of course, there will be a long wait before the pull is merged and the change shows up in even the pre-release. It's not an entirely gratifying process.

Thanks,
Mark

Paul Netsaver

unread,
Nov 23, 2017, 5:00:36 AM11/23/17
to TiddlyWiki


Il giorno lunedì 6 novembre 2017 11:28:23 UTC+1, @TiddlyTweeter ha scritto:
Ciao Mark S. & Paul

FWIW, both CKEditor (that BJ made a plugin for to be able to run natively in TW) and TinyMCE WYSIWYG HTML editors have tools to reduce HTML bloat. Both also have cleaning import for Word documents.

Hi, I didn't look at CKEditor plugin, at the moment I: Edit/Cut/goto html-cleaner.com/paste/clean/cut/paste-toTW, but it is rather boring... ;-)
I'll try CKE extension, thanks 

Paul Netsaver

unread,
Nov 23, 2017, 8:21:37 AM11/23/17
to TiddlyWiki
Hi to all,
I managed to install visualeditor following the instructions. Now, the point would be customizing the cleaner inside CKEditor...
I tried with changing plugins/bj/visualeditor/config.json, but this seems to be implicitly the content of CKEDITOR.editorConfig = function( config ){...}, while 
in order to configure the CK filter (ACF), I should edit the  CKEDITOR.Config allowedContent and disallowedContent properties..

Does anyone know how to customize that section?

Thanks a lot,
Paul Netsaver

BJ

unread,
Nov 23, 2017, 9:56:12 AM11/23/17
to TiddlyWiki
you would have to change the javascript it is in the tiddler ckedit.js.

BJ

unread,
Nov 24, 2017, 2:29:49 PM11/24/17
to TiddlyWiki
I could add an exta config option eg

extraAllowedContent = '*[id];*(*);*{*};p(*)[*]{*};div(*)[*]{*};li(*)[*]{*};ul(*)[*]{*};span(*)[*]{*};table(*)[*]{*};td(*)[*]{*}';

@TiddlyTweeter

unread,
Nov 24, 2017, 2:37:00 PM11/24/17
to TiddlyWiki
Ciao BJ

I often hesitate asking you to do stuff because a lot of your plugins are so "under the hood" fundamental that i feel like "he has done the core work already! so now I need do something from my own volition and not bother him."

Your CKEditor adaptation for TW is really excellent. I been using it a long time.

If you willing to tweak it further for more flex I'd be very happy too.

Best wishes
Josiah

BJ

unread,
Nov 25, 2017, 5:02:03 AM11/25/17
to tiddl...@googlegroups.com
I am just finishing a new release of tiddlyclip, to work with firefox 57, and will include dompurify.

Paul Netsaver

unread,
May 17, 2018, 6:08:11 PM5/17/18
to TiddlyWiki
It would be interesting... which would be the syntax among (*)[*]{*}? elements [attributes]{styles}(classes)?
I'ld like to edit config options, then apply the filter via toolbar button or automatically at CKE loading.
Let us know if it could be in the next releases...
thanks
Reply all
Reply to author
Forward
0 new messages