Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Help/Advice with CFHTTP Parser

6 views
Skip to first unread message

silstorm

unread,
Jan 6, 2008, 3:02:01 PM1/6/08
to
Hi Folks,

I have built a very simple parser using CFHTTP to grab prices and stock
availability on various products. Using the same theory, it works fine with
some sites but no with others. I figured I'd pick out one of them as an
example to see where it might be falling down.

My question is two-fold - firstly, why does this example produce an error, and
secondly, is there a better way of doing this (I don't have access to anything
other than the page data).

So, on to the script;

<!--- cfhttp scraper --->
<!--- price --->
<cfset nstartcode='<div class="standard-price"> <span
class="label">Price </span> ?'>
<cfset nendcode='</div>
<div class="deliveryShortcut">'>
<!--- get the page --->
<cfhttp
url="http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
<cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
<cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<!--- stock --->
<cfset astartcode='class="stock">'>
<cfset aendcode='</div>'>
<!--- get the page --->
<cfhttp
url="http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(astartcode, cfhttp.FileContent) +14>
<cfset nEnd=Find(aendcode, cfhttp.FileContent, nStart+1)>
<cfset stock_status=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<cfset stock='#stock_status#'>
<!--- --->

The following error is returned when I run the above;

<!--- error message --->
The 3 parameter of the Mid function, which is now -14295, must be a
non-negative integer

The error occurred in **removed**\parse.cfm: line 11

9 : <cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
10 : <cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
11 : <cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
12 : <!--- stock --->
13 : <cfset astartcode='class="stock">'>
<!--- --->

Any help or advice would be greatly appreciated.

BKBK

unread,
Jan 7, 2008, 7:25:30 AM1/7/08
to
I have examined the source code of the page's HTML. The structure is too
disorganised for you to try something like that. Then it is not automated
mining, but mining with fingers and nails.

There are just no obvious patterns. In such cases it often helps to parse the
source as XML, and pick out the data from there.

silstorm

unread,
Jan 7, 2008, 7:37:41 AM1/7/08
to
Thanks BK, I was hoping that wouldn't be the case.

I appreciate your input.

0 new messages