HTML to plain text

27 views
Skip to first unread message

Andrea Borruso

unread,
Feb 6, 2018, 4:35:38 AM2/6/18
to OpenRefine
Hello,
I need to convert

<span id="name">Lorem <b>ipsum</b> some text</span>

to "Lorem ipsum some text".

I have tried with "value.parseHtml().select("span").toString()", but I obtain "<span id="name">Lorem <b>ipsum</b> some text</span>".

Is there a generic way to convert the text inside an HTML tag in a plain text.

Thank you

Andrea Borruso

unread,
Feb 6, 2018, 4:40:51 AM2/6/18
to OpenRefine
Solved: it's htmlText()

Thank you

Thad Guidry

unread,
Feb 6, 2018, 9:08:49 PM2/6/18
to openr...@googlegroups.com
Andrea,

Did you solve it by reading our Wiki docs and recipes ?


On Tue, Feb 6, 2018 at 3:40 AM Andrea Borruso <abor...@gmail.com> wrote:
Solved: it's htmlText()

Thank you

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

andy

unread,
Feb 7, 2018, 3:03:19 AM2/7/18
to openr...@googlegroups.com
Thad,

On 7 February 2018 at 03:08, Thad Guidry <thadg...@gmail.com> wrote:
Did you solve it by reading our Wiki docs and recipes ?


But in some way, I'm not very smart, it confused me, because I have read "Extract text" and I thought that the way to extract text was to use ".toString()".

I gave up and wrote to you. Then I realized that I had not read yet the "Other Functions" section.

Thank you

--
___________________

Andrea Borruso
website: https://medium.com/tantotanto
38° 7' 48" N, 13° 21' 9" E, EPSG:4326
___________________

"cercare e saper riconoscere chi e cosa,
 in mezzo all’inferno, non è inferno, 
e farlo durare, e dargli spazio"

Italo Calvino

Thad Guidry

unread,
Feb 7, 2018, 10:01:55 AM2/7/18
to openr...@googlegroups.com
Thanks, I've added that example of simple HTML text extraction to that wiki page !

https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML


--

andy

unread,
Feb 7, 2018, 10:36:34 AM2/7/18
to openr...@googlegroups.com
On 7 February 2018 at 16:01, Thad Guidry <thadg...@gmail.com> wrote:
Thanks, I've added that example of simple HTML text extraction to that wiki page !

https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML

You are very kind, thank you very much, great thing!
Reply all
Reply to author
Forward
0 new messages