I have built a very simple parser using CFHTTP to grab prices and stock
availability on various products. Using the same theory, it works fine with
some sites but no with others. I figured I'd pick out one of them as an
example to see where it might be falling down.
My question is two-fold - firstly, why does this example produce an error, and
secondly, is there a better way of doing this (I don't have access to anything
other than the page data).
So, on to the script;
<!--- cfhttp scraper --->
<!--- price --->
<cfset nstartcode='<div class="standard-price"> <span
class="label">Price </span> ?'>
<cfset nendcode='</div>
<div class="deliveryShortcut">'>
<!--- get the page --->
<cfhttp
url="http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
<cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
<cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<!--- stock --->
<cfset astartcode='class="stock">'>
<cfset aendcode='</div>'>
<!--- get the page --->
<cfhttp
url="http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(astartcode, cfhttp.FileContent) +14>
<cfset nEnd=Find(aendcode, cfhttp.FileContent, nStart+1)>
<cfset stock_status=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<cfset stock='#stock_status#'>
<!--- --->
The following error is returned when I run the above;
<!--- error message --->
The 3 parameter of the Mid function, which is now -14295, must be a
non-negative integer
The error occurred in **removed**\parse.cfm: line 11
9 : <cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
10 : <cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
11 : <cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
12 : <!--- stock --->
13 : <cfset astartcode='class="stock">'>
<!--- --->
Any help or advice would be greatly appreciated.
There are just no obvious patterns. In such cases it often helps to parse the
source as XML, and pick out the data from there.
I appreciate your input.