Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Convert HTML-Table to CSV with AWK

1,464 views
Skip to first unread message

gismo82

unread,
Sep 9, 2011, 7:41:19 AM9/9/11
to
Hi...

i have a problem to convert a HTML table to a CSV file.

I like to convert the following table to CSV:

<table>
<tr>
<td>text</td>
<td><a href="column2>column2</a></td>
<td><img border=0 src="test.png" width=24 height=14></td>
</tr>
...
</table>

The CSV should be:

text;<a href="column2>column2</a>;<img border=0 src="test.png"
width=24 height=14>

I found the following AWK-Script on the web, but i'ts working not as a
purposed. The output is like this:

; text; column2;

Here is the AWK script

BEGIN {
s="";
FS="nXX";
}

/<td|<TD/ {
gsub(/<TD[^>]*>/, "");
s=(s "; " $1);
}

/<tr|<TR/ {
print s;
s=""
}

END {
print s;
}


Could anyone help me?


Andreas

Ed Morton

unread,
Sep 9, 2011, 8:54:02 AM9/9/11
to
On 9/9/2011 6:41 AM, gismo82 wrote:
> Hi...
>
> i have a problem to convert a HTML table to a CSV file.
>
> I like to convert the following table to CSV:
>
> <table>
> <tr>
> <td>text</td>
> <td><a href="column2>column2</a></td>
> <td><img border=0 src="test.png" width=24 height=14></td>
> </tr>
> ...
> </table>
>
> The CSV should be:
>
> text;<a href="column2>column2</a>;<img border=0 src="test.png"
> width=24 height=14>
>
<snip>

gawk -v RS='</tr>' -v FS='</td>[[:space:]]*<td>' -v OFS=';' '
{$1=$1} gsub(/.*<td>|<\/td>.*/,"")
' file

Regards,

Ed.

Janis Papanagnou

unread,
Sep 9, 2011, 3:16:04 PM9/9/11
to
If your data is as regular as you've shown above you could use

awk '
match($0,/<td>.*<\/td>/) {
printf "%s%s", del, substr($0,RSTART+4,RLENGTH-9); del=";"
}
'

Janis

>
>
> Andreas

0 new messages