Convert a string into another, problems with formatting.....

17 views
Skip to first unread message

fabius pocus

unread,
Aug 6, 2022, 5:19:52 PM8/6/22
to OpenRefine
Hi. These data contain only two rows and one column. I'd like to transform row 1 into row2. In other words I'd like to convert this:

<p>\n\t<b>\n\t\t- What ........................?\n\t</b>\n</p>\n\n<p>\n\t<b>\n\t\t- I'm a graphic designer.\n\t</b>\n</p>\n

to this:

<p>
    <b>
        - What ........................?
    </b>
</p>

<p>
    <b>
        - I'm a graphic designer.
    </b>
</p>

How can I obtain that in openrefine? Thanks a lot for any effort,

Fabio Frascati

Data_example.txt

Owen Stephens

unread,
Aug 8, 2022, 5:33:42 AM8/8/22
to OpenRefine
You can do something like this by using the parseHtml() function
The exact way to do this may depend on the exact structure of your data. If it is just all exactly as this example then it is straightfoward to split this into two rows

First
value.parseHtml().select("p").join("||")
This will select the two <p> tags from the data and join them with a double pipe character (you can use other characters here of course). This would result in:
<p>\n\t<b>\n\t\t- What ........................?\n\t</b>\n</p>||<p>\n\t<b>\n\t\t- I'm a graphic designer.\n\t</b>\n</p>

Then you can use Edit Cells -> Split mutli-valued cells menu option to split this into two rows in the project
<p>\n\t<b>\n\t\t- What ........................?\n\t</b>\n</p>
<p>\n\t<b>\n\t\t- I'm a graphic designer.\n\t</b>\n</p>

You can then use a replace statement like:
value.replace("\\t","    ").replace("\\n","
")

To replace the \n and \t with a tab and newline - note that the reason this expression goes onto a new line is because there is a newline character inside the second inverted commas. This works but it feels like there should be a neater way of doing this so if anyone else can advise that would be great!

Best wishes

Owen

fabius pocus

unread,
Aug 11, 2022, 2:24:57 PM8/11/22
to OpenRefine
Thanks a lot Owen . Your help is always useful.
Reply all
Reply to author
Forward
0 new messages