You find the most interesting things to work on. :-) Sounds very cool.
Michael -- What a nice thing to say. I've been a little bored with my work. Your comment is a helpful reminder. -- Ward
p.s. a little more about what I am doing ...
1. We scrape websites but have very little insight into what we've scraped. I asked myself the question, how can I look at 30,000 pages for some reasonable notion of look? I started exploring our data with regular expressions in perl scripts. 30,000 pages was within perl's limits: enough for statistical properties to emerge without runs being too slow.
2. Discussing my exploration at Open-Source Bridge this summer, several colleagues mentioned PEG parsers as an alternative to regular expressions. I looked into TreeTop, a PEG parser in Ruby, but it was way too slow. I tried Ian Pumarta's peg/leg parser, which was blindingly fast, but required I write in C, hardly exploratory.
3. I needed an organizing/simplifying principle that could guide my C programming. I allowed myself to compare my quest to read a zillion pages to that of the climate scientists studying the history of weather on Earth with core samples. This step is vague in my memory, but I ended up in Wikipedia here:
http://en.wikipedia.org/wiki/Exner_equation
4. I wrote C subroutines called Aggrade and Degrade, inspired by the Exner equation. These adjusted to the various "flow rates" through the production rules of what ever grammar I happened to be running at the moment. We wrapped this all into a webapp that I could run on EC2 where my big data was stored. Thus is born "Exploratory Parsing", an agile data mining methodology.
I mention this here on this list because of the central role that the Sediment Metaphor took in guiding me when I had no other plan.
I wasn't looking for a way to explain what I had already done, I was looking for a way forward with only a vague notion of what I wanted to do (look at pages.) Lakoff and Johnson say that a metaphor will be sustained in the culture when it works together in a cognitive system that delivers value. Sediment delivered where other ways to think about my problem failed me. The metaphor has compounded upon itself to produce further unexpected bounty. To quote some numbers: I can now read all 30 gigabytes of Wikipedia in 6 seconds, for a defensible definition of read.
Ward, this sounds awesome... are you able to share code snippets that
give more illustration to how this metaphor influenced the code you
ended up with?
On Dec 7, 6:14 pm, Ward Cunningham <w...@c2.com> wrote:On Dec 6, 2010, at 9:07 PM, Michael Feathers wrote:On Mon, Dec 6, 2010 at 11:06 PM, Ward Cunningham <w...@c2.com> wrote:Yes, I have instrumented peg/leg this way and have found it very useful.This code has not been released into open source.You find the most interesting things to work on. :-) Sounds very cool.