On Sat, 31 Mar 2012 09:28:23 -0700, markspace <-@.> wrote, quoted or
indirectly quoted someone who said :
>
>Or something else?
>
>In other words, in some detail, what's the actual anti-pattern here?
This will take a while.
The basic project is probing bookstores, online electronics stores and
DVD stores to find out which of the books, electronics, and DVDs in
mention on my website they have in stock. I generate links to where
you can find that item on their store, greyed out if I think they do
not have it. See
http://mindprod.com/book/books.html to see what the
links look like.
To speed this up I decided to use 30 threads. It turns out that I
then had to slow it down again, because most stores stop listening to
you if you probe them too rapidly. Some stores like O"Reilly and
Barnes and Noble don't mind you hammering them.
The problem was I was doing this via screen scraping the same
interface that humans use. That interface is in constant flux. Trying
to decide whether a book is in or out of stock turns out to require
looking for dozens of strings, each of which has positive or negative
weight. I had to keep tweaking these magic strings whenever my logic
could not decide based on its current set of strings. Sometimes I
would look with my own eyes at the page for 20 minutes, sifting
through the hidden text, the Javascript, ... and still could not
decide, I have been trying to talk stores into putting in/out of stock
icons on their pages for humans to make it clear.
It is all quite crazy. One store you must probe with both a 10 and 13
digit ISBN. They file some books one way and others the other, and
some both.
Amazon advertised a SOAP-XML interface to get this information in a
stable, efficient way. It turned out all their docs are incorrect/out
of date. However, I eventually got it going, (now documented it at
http://mindprod.com/jgloss/amazon.html) only to discover that a probe
even once every 20 seconds is overwhelming to
amazon.com though the
other stores are more forgiving.. (How on earth do they survive DOS
attacks?) However, the API works on the European Amazon stores too and
in English, unlike the screen scraping which is customised for each
language/store.
The important thing to understand is how this gradually grew in
compexity. The mistake I made was firing up the whole process of
probing a set of bookstores to refresh in-out-of-stock data with a
static init. This all worked fine until the camel straw when I, in
one of my threads, called a very vanilla method to convert a ISBN to
an ASIN via HashMap lookup, where the method just happened to be in
the same class as the static init fireup. Up until that point that,
by some fluke, never happened. That delay in the penalty for me error
threw me off the scent.
What I do now is have a fireup method that gets called at the start,
and everything that needs to happen at the start of a run, gets a
method inserted there.
In the FORTH days I was able to build both fireup and shutdown chains
as a side effect of compilation, which kept the code more modular.
Today I can build shutdown chains dynamically at class load but not
fireup chains.