How to strip html tags from a binary?

536 views
Skip to first unread message

Александр Сальников

unread,
Aug 16, 2015, 12:08:42 PM8/16/15
to elixir-lang-talk
Hi everybody, 
i'm looking for a library that allows to strip html tags from binary except some.
All i've found is this library rrrene/html_sanitize_ex · GitHub, it rely on mochiweb when i'm using cowboy, 
is it a normal practice to have such a heavy dependency in my project just for 1 purpose?

Best regards,
Alexander Salnikov

René Föhring

unread,
Aug 18, 2015, 3:24:06 AM8/18/15
to elixir-lang-talk
Hi Alexander,

could you explain the reasoning behind "rely on mochiweb when i'm using cowboy"? The connection of the two is not clear to me. I'm the maintainer of html_sanitize_ex and unfortunately don't have an Erlang background, so I'm always thankful for any advice in that area!

With regard to your problem: If you "just" want to strip HTML tags from a binary you could use a regex. If you want to sanitize HTML content, which means allowing some while stripping/scrubbing other parts, you necessarily have to use an HTML parser.

Best,
René

Александр Сальников

unread,
Aug 20, 2015, 10:18:05 AM8/20/15
to elixir-lang-talk
Thank you, i just hesitating, is it a good practice to have as dependencies both mochiweb and cowboy?

вторник, 18 августа 2015 г., 10:24:06 UTC+3 пользователь René Föhring написал:

René Föhring

unread,
Aug 21, 2015, 10:11:48 AM8/21/15
to elixir-lang-talk
Just to be clear: I am no authority on this, so maybe others can chime in with their opinion.


I think if you have both mochiweb and cowboy as a dependency that might seem like overhead, since they have an overlapping feature set. 

That said, you can't really speak of "good practices" when you don't have much choice/alternatives. We as a community are still evolving, just like the language ecosystem. 

html_sanitize_ex is powering the markdown sanitization at http://elixirstatus.com which is open-sourced at https://github.com/rrrene/elixirstatus-web if you want to take a look.

Conclusion: if there is another option for your problem that is more lightweight by all means, go for it (as I mentioned: use a Regex if you want basic "strip all tags" functionality). But if you want a fully custimizable HTML Sanitizer implementing the HTML5 spec because, say, you are building an Angular SPA, that is a CMS backed by a Phoenix API, then use html_sanitize_ex or something akin to it and don't spend too much time thinking about dependencies.


As mentioned: More input on this from others is more than welcome!


Best,
René

Booker Bense

unread,
Aug 21, 2015, 11:26:19 AM8/21/15
to elixir-lang-talk


On Friday, August 21, 2015 at 7:11:48 AM UTC-7, René Föhring wrote:
Just to be clear: I am no authority on this, so maybe others can chime in with their opinion.


I think if you have both mochiweb and cowboy as a dependency that might seem like overhead, since they have an overlapping feature set. 



My guess is that many people are wary of dependencies due to experiences with "ruby gem hell". Bundler largely fixed that, 
but for the most part Elixir apps are "pre-bundled". It's really straightforward to pin your dependencies to certain versions. 

While Elixir isn't quite there yet, the problem of conflicting versions of dependencies does come up with larger Erlang applications. 
The rebar episode of the Mostly Erlang podcast gives a good overview of the issues and one scheme to solve it. 


Getting everyone on board with Semantic Versioning would help, but even that is not complete fix. The more you can limit
your dependencies, the better. As a last resort, the BEAM makes it easier to split your application into processes running on
separate nodes, but that is the nuclear option when it comes to library incompatibilities.

Given that this problem only arises when you have a large vibrant community with many freely available external libraries, 
it's a good problem to have.

-  Booker C. Bense

Johan Wärlander

unread,
Aug 21, 2015, 5:45:57 PM8/21/15
to elixir-lang-talk
Looking at the history for mochiweb_html, it sort of feels like it might be an idea to publish it as a separate package.. It serves a very clearly defined role, doesn't really seem closely tied to mochiweb, and is apparently pretty stable as it hasn't seen any real changes since 2013.
Reply all
Reply to author
Forward
0 new messages