> How I would do it.
> 1) Grab content from within tags using regex
> 2) Replace " - " with tabs
> 3) select Everything and paste into Excel or any other spreadsheet
> processor.
> The tabs will ensure that it gets into two columns.
> 4) In the third column, drag down the countIF furmula checking whether
> text in the neighbouring column is present in law firms list. If
> formula produces "1" or more, swap contents of name&surname cell and
> company name cell.
Here is the regex and the excel file that illustrate my approach.
The regex (in free spacing mode and in "dot matches newlines" mode)
<title>
((?:(?!</title>)(?! - ).)*?) #column 1
(\x20-\x20(?:(?!</title>)(?! - ).)*?) #column2
(\x20-\x20(?:(?!</title>)(?! - ).)*?)? #column3 (optional)
</title>
As the next step, I apply this S&R operation to each line that the
above regex produces.
I search for (\r\n| - |, ) and replace with a space.
Note1: I assume that " - " can not occur in a layer name or in company name
Note2. I assume that any comma is to be deleted.
So I get 3 columns (based on the example given; tabs invisible, sorry)
Gregg M Galardi Skadden Arps
John M. Simpson Partner The International Law Firm of Fulbright & Jaworski
Sidley Austin LLP Our People Todd Wagner
With this I do some Microsoft Excel manipulation which are shown in
the attachment.
They bring us to the desired result.