Today I am using SgmlLinkExtractor with process_value to transform relative to absolute paths.
rules = (
Rule(SgmlLinkExtractor(tags="a",attrs="href",allow=r'#day=2013-12-24&Id=33', process_value=my_process_value), callback='my_parser', follow=False,),
)
def my_process_value(value):
print '---->'+value
return
When I run the spider I can see all response links, this is the output:
---->#Day=2013-12-24&Id=33
---->#Day=2013-12-24&Id=1269753
---->#Day=2013-12-24&Id=1269753
---->#Day=2013-12-24&Id=1269772
---->#Day=2013-12-24&Id=1269772
I want the first relative link only, , it's like allow param doesn't take effect. The output should be this
---->#Day=2013-12-24&Id=
33Do you know the reason ?