Best way to parse html

33 views
Skip to first unread message

Jimboidaho

unread,
Mar 16, 2012, 5:23:56 PM3/16/12
to professi...@googlegroups.com
I have a web service that is returning error messages in long ugly HTML.  I am trying to figure out a good way to parse out the messages so I can present them in my own form. Here is the html.

	 <tr><td nowrap>Error Code : </td>
<td>5000</td>
<tr><td>Error Name : </td>
<td>Credit Card Number Invalid</td>
<tr><td>Error Message : </td>
<td>The Credit Card Number supplied in the authorization request appears to be invalid.</td>
</tr>

I used this code to get to the table I need
        $beg = strpos($error, '<table>');
        $end = strpos($error, '</table>');
        $table = substr($error, $beg, $end-$beg);

Now I need the values highlighted in red. 

Thanks.

Robert Gonzalez

unread,
Mar 16, 2012, 5:45:08 PM3/16/12
to professi...@googlegroups.com
Can you parse it with something like simplexml? Then you can access the members of each node as you wish.



--
This group is managed and maintained by the development staff at 360 PSG. An enterprise application development company utilizing open-source technologies for todays small-to-medium size businesses.
 
For information or project assistance please visit :
http://www.360psg.com
 
You received this message because you are subscribed to the Google Groups "Professional PHP Developers" group.
To post to this group, send email to Professi...@googlegroups.com
To unsubscribe from this group, send email to Professional-P...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/Professional-PHP



--

Robert Gonzalez
   

Robert Gonzalez

unread,
Mar 16, 2012, 5:49:12 PM3/16/12
to professi...@googlegroups.com
Actually, I meant to say the PHP DOM parser. But I suspect that simplexml would still handle this if the markup was well formatted.

Robert Gonzalez

unread,
Mar 16, 2012, 6:07:16 PM3/16/12
to professi...@googlegroups.com
Dude, I hate it when someone asks a question and it makes me suddenly want to solve it. I never, EVER, do this, but I just whipped up some code to get an array of data from that HTML chunk you posted. If it helps you, cool. If not, maybe someone can use it as a tutorial or something. Anyway, enjoy everyone:

<?php
$h = '<tr><td nowrap>Error Code : </td>
<td>5000</td>
<tr><td>Error Name : </td>
<td>Credit Card Number Invalid</td>
<tr><td>Error Message : </td>
<td>The Credit Card Number supplied in the authorization request appears to be invalid.</td>
</tr>';

$d = new DOMDocument;
$d->loadHTML($h);
$tds = $d->getElementsByTagName('td');

$errors = array();
$key = null;
for ($i = 0; $i < $tds->length; $i++) {
    $td = $tds->item($i);
    preg_match_all('/Error (.+) :/', $td->nodeValue, $match);
    if (isset($match[1][0])) {
        $key = strtolower($match[1][0]);
    } else {
        if ($key) {
            $errors[$key] = $td->nodeValue;
            $key = null;
        }
    }
}

/*
$errors will now contain the following structure:
array(3) {
  ["code"]=>
  string(4) "5000"
  ["name"]=>
  string(26) "Credit Card Number Invalid"
  ["message"]=>
  string(83) "The Credit Card Number supplied in the authorization request appears to be invalid."
}
*/
?>

Also, here's a paste bin of this snippet if you'd rather read it cleanly formatted.

Robert Gonzalez

unread,
Mar 16, 2012, 6:41:42 PM3/16/12
to professi...@googlegroups.com
Dammit, why can I not leave this alone? lol

For less readable, but more concise, code, you can also do this in your for loop:

for ($i = 0, $j = 1; $i < $tds->length; $i += 2, $j += 2) {
    preg_match_all('/Error (.+) :/', $tds->item($i)->nodeValue, $match);
    if (isset($match[1][0])) {
        $errors[strtolower($match[1][0])] = $tds->item($j)->nodeValue;
    }
}

Ok, I really need to stop this now. Hope someone got something out of this.

Jimboidaho

unread,
Mar 16, 2012, 7:12:20 PM3/16/12
to professi...@googlegroups.com
You the man.  I have it working.  I tried using all the html and I get these warnings.  Still seems to work.  I just wish Elavon would get their act together.  This is really an old and crappy interface.  It took me a half day just to figure out that their service wouldn't accept Multipart form data :(.

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 83 in D:\phpProjects\MLSworkspace2\MLS\pages\test_page.php on line 73 Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 84 in D:\phpProjects\MLSworkspace2\MLS\pages\test_page.php on line 73 Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 85 in D:\phpProjects\MLSworkspace2\MLS\pages\test_page.php on line 73 Warning: DOMDocument::loadHTML(): Unexpected end tag : form in Entity, line: 89 in D:\phpProjects\MLSworkspace2\MLS\pages\test_page.php on line 73

btw, this new google groups web interface really sucks.

zorro

unread,
Mar 16, 2012, 7:26:34 PM3/16/12
to professi...@googlegroups.com
Here's a couple of more ways, using explode() or preg_match_all()

$table = '<table>

<tr><td nowrap>Error Code : </td>
<td>5000</td>
<tr><td>Error Name : </td>
<td>Credit Card Number Invalid</td>
<tr><td>Error Message : </td>
<td>The Credit Card Number supplied in the authorization request appears to be invalid.</td>
</tr></table>';

$rows = explode('<tr>', $table);
unset($rows[0]);
foreach($rows as $row) {
  list($name, $error) = explode('</td>', $row);
  list($null, $error) = explode('>', $error);
  $errors[] = $error;
}
echo '<pre>'.print_r($errors, 1).'</pre>';    

preg_match_all('|<tr>\s*<td[^>]*>[^<]+</td>\s*<td>([^<]+)</td>\s*|Usi', $table, $matches);
$errors = $matches[1];
echo '<pre>'.print_r($errors, 1).'</pre>';
  
Produces
Array
(
    [0] => 5000
    [1] => Credit Card Number Invalid
    [2] => The Credit Card Number supplied in the authorization request appears to be invalid.
)


Robert Gonzalez

unread,
Mar 16, 2012, 8:25:43 PM3/16/12
to professi...@googlegroups.com
You can still get the contents of each tag using a regex then loop over those if you wanted to. Seems a little less dirty to me than using a bunch of string functions when you don't need to.

Here's one that does what I'm suggesting without the use of DOM stuff and without the need for all the string function use:

<?php
$h = '<tr><td nowrap>Error Code : </td>
<td>5000</td>
<tr><td>Error Name : </td>
<td>Credit Card Number Invalid</td>
<tr><td>Error Message : </td>
<td>The Credit Card Number supplied in the authorization request appears to be invalid.</td>
</tr>';

preg_match_all('#<td([ nowrap]*)>(.*?)</td>#s', $h, $matches);
if (!empty($matches[2]) && is_array($matches[2]) && ($count = count($matches[2])) % 2 == 0) {
    for ($i = 0, $j = 1; $i < $count; $i += 2, $j += 2) {
        preg_match_all('/Error (.+) :/', $matches[2][$i], $match);
        if (isset($match[1][0])) {
            $errors[strtolower($match[1][0])] = $matches[2][$j];
        }
    }
}
/*
$errors will now contain the following structure:
array(3) {
  ["code"]=>
  string(4) "5000"
  ["name"]=>
  string(26) "Credit Card Number Invalid"
  ["message"]=>
  string(83) "The Credit Card Number supplied in the authorization request appears to be invalid."
}
*/

This should alleviate the issues you are getting with the errors from before. Forgot that I have error reporting ramped down on my local server for the time being so I didn't see those errors. Apologies. I don't like posting code that contains errors. Pretty sure this one doesn't.

Hope it helps.

--
This group is managed and maintained by the development staff at 360 PSG. An enterprise application development company utilizing open-source technologies for todays small-to-medium size businesses.
 
For information or project assistance please visit :
http://www.360psg.com
 
You received this message because you are subscribed to the Google Groups "Professional PHP Developers" group.
To post to this group, send email to Professi...@googlegroups.com
To unsubscribe from this group, send email to Professional-P...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/Professional-PHP

Jimboidaho

unread,
Mar 17, 2012, 11:14:58 AM3/17/12
to professi...@googlegroups.com
Hi Robert.  The code I posed originally did not have errors.  I was trying to take out the first step of parsing the the table out of the html.  Many thanks again for your help. 
Hope it helps.

To post to this group, send email to Professional-PHP@googlegroups.com
To unsubscribe from this group, send email to Professional-PHP-unsubscribe@googlegroups.com

For more options, visit this group at http://groups.google.com/group/Professional-PHP
Reply all
Reply to author
Forward
0 new messages