Gaute
Add removal is perhaps not exactly what I had in mind, come to think of it.
Rather, I am talking about selecting only particular parts of the page for
comparison. Extract the good bits, rather than remove the bad bits, you might
say.
But combining usability with power is the problem. ( as always I guess :)
Personally I would prefer css selector or xpath since firebug provides those
very neatly, but most users would maybe not know what to do with those.
Having some feedback that you selected the right parts is an issue too.
Then again perhaps most users of Specto are maybe more technically inclined
than the average...?
> The best logical "workaround" I could find at the time (years ago) was
> to implement an "error margin" (which, ideally, should be possible to
> "auto-calibrate", see issue #36 in our issue tracker), supposing that
> "real content updates" would have a significantly higher percentage of
> change than random ads or timestamps...
That's precisely my problem. The real changes are often smaller than the sites
"featured shit" section.
The simplest solution I can think of is the two fields:
skip everything before xx
skip everything after xx
perhaps with an option for xx to be a regexp.
That would work for me. Only question is whether it would work for enough
pages to be worth including for all users.
Perhaps I should just hack it. That would save me from installing bazaar :)
Gaute
> and I think the typical specto
> userbase is not necessarily a fan of regular expressions (even I can't
> understand them!)
when talking about the "skip before" and "skip after" idea, I only mentioned
regexps because they are a sort of "unobtrusive powertool", you can _allmost_
just pretend it's a simple string match.. Posix regexps even more so.
> Maybe an easier/more transparent solution would be to reuse filterset.g
> lists of adblock plugins to take out parts of the page using
> beautifulsoup before comparing, or something like that...
Uggh. me not like.
Depending on some arbitrary list of filters that may or may not get updated..
Avoiding that is the whole point of the "pick out the good parts" strategy.
And anyway, in my case, the often-changing crap is not ads, but the sites
self-promotion.
Perhaps Wout's idea to copy the "watch_web_static.py" is better.
Then one could have a "advanced web watch" for the brave :)
Why is it called "static" btw? Is there a "dynamic" in the works?
( I using Intrepid's 0.2.2 )
Gaute
> You should really use version 0.3rc1...there is a deb and a tar.gz
> available on our homepage.
> The new version is almost a complete rewrite of the core and the plugins so
> be sure to make your changes to the new version!
>
Ah. good to know.
Thanks.
Gaute
I think it is called static because it is not able to login into a webpage? but i am not sure.
You should really use version 0.3rc1...there is a deb and a tar.gz available on our homepage.