Hi M,
Thank you for the response. See attached when I run the rule:
>>> response
TextResponse(url='
http://www.somesite.com/asdf/asdf/
asfd_3324_adsfa-1_0', status=200, body='', headers={'Content-Length':
['0'], 'Content-Language': ['en-US'], 'Server': ['IBM_HTTP_Server'],
'Connection': ['close'], 'Date': ['Sun, 13 Mar 2011 16:47:56 GMT'],
'P3P': ['CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV
INT DEM PRE"'], 'Content-Type': ['text/plain']},
request=Request(url='
http://www.some.com/site/here/
fsproductdetail_10652_7dadf2_565_-1_0', method='GET', body='',
headers={'Accept-Language': ['en'], 'Accept-Encoding':
['gzip,deflate'], 'Accept': ['text/html,application/xhtml
+xml,application/xml;q=0.9,*/*;q=0.8'], 'User-Agent': ['Mozilla/5.0
(X11; U; Linux i686; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.10
(maverick) Firefox/3.6.15'], 'Cookie':
['WC_GENERIC_ACTIVITYDATA=[299194823%3atrue%3afalse
%3a0%3aeNxMdWLIdOXPx15ZidBzh%2bnIBXI%3d]
[com.ibm.commerce.context.base.BaseContext|
10652%26%2d1002%26%2d1002%26%2d1]
[com.ibm.commerce.catalog.businesscontext.CatalogContext|29104%26null]
[com.ibm.commerce.context.globalization.GlobalizationContext|%2d1%26USD
%26%2d1%26USD][com.ibm.commerce.context.entitlement.EntitlementContext|
4000000000000000505%264000000000000000505%26null%26%2d2000]
[com.ibm.commerce.context.experiment.ExperimentContext|null]
[CTXSETNAME|Store][com.ibm.commerce.context.audit.AuditContext|null];
WC_ACTIVEPOINTER=%2d1%2c10652; WC_USERACTIVITY_-1002=
%2d1002%2c10652%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull%2cnull
%2c9hMW09Lr9qRot%2f%2bbZxrrGhP71ppokVUauZLuyY
%2bwhqIe4VYmSxCSCYCavpzQI3WnZ%2bWPO706u0WO%0a
%2fd0ZSu29X9jtsTUki1HWw8FB5W5YbVJmcyhif4iaQnWooqR4FLTeVYrJHvaHtMw%3d;
WC_SESSION_ESTABLISHED=true;
WCS_JSESSIONID=0000AulVYLoP5sLzhTto6hVkYRF:15lh8tfg4'], 'Referer':
['
http://www.some.com/site/here/producthierarchy1_10dd652_-asdfa2']},
cookies={}, meta={'download_timeout': 180, 'depth': 1, 'link_text':
u'The Actual Valid Tag*Power Supplies asdfsdfadsf adsfasdf
asdfsadf'}), flags=[])
>>> hxs
>>> str(hxs)
'None'
as opposed to when I crawl directly.
>>> hxs
<HtmlXPathSelector xpath=None data=u'<html><head><meta name="Keywords"
conten'>
From which I can do all the normal xpathing you'd expect...
Very strange to me. Am I not passing the proper request item with the
spider or something?
Best,
Bobby