Hey Chauncey!
Sorry, I'm just now getting back to you. I've been out this
afternoon. I'm looking at what you have committed so far and I
certainly admire your initiative! I had not anticipated someone so
quickly swooping in with another language to support, but I suppose
thats the way it is with projects like these! :-) I have to admit
though, that I'm a bit torn, because PHP isn't really a language I
know well, and its going to be difficult for me to support it as the
platform evolves. I'm not saying that I don't want to roll your code
in (the last thing I want is for effort to go to waste), but I do need
to really think about how I'm going to structure things so that I can
support multiple languages (Ruby here we come?) without surprising
myself with a lot of breaking changes. That said, I don't want to
leave you hanging, so here is what I have in mind:
1) I will rewrite the scheduler (run.py) to support executing
scripts from multiple languages. (You don't worry about a PHP-specific
scheduler, as we need to be able to execute all of them from a single
entry point.)
2) I will pull the script metadata data out into a configuration
file, such as Fifty States does with the STATUS file, so that this
information can be retrieved in a language-agnostic way.
3) If you can bring your VotersDaily_Abstract PHP class up to
feature parity with the Python EventScraper class then I will merge
your PHP utilities into the tree under scripts/phputils
4) Then we can merge the rest of your scrapers into their
respective directories.
I'm not sure what facilities PHP has for interacting with CouchDB,
but I really want to avoid temporarily storing the data on the
fileystem (as CSV, JSON, etc.) Because many of the potential uses for
this data are semi-real-time it doesn't really make sense to
preprocess all the data and then do a bulk import of those flat files.
Let me know what you think of this plan. I'm confident we can find
a way to get these codebases interacting happily. :-)
Thanks for your efforts,
Chris
On Aug 15, 11:44 am, Chauncey Thorn <
chaunc...@gmail.com> wrote:
> Can I get you to check to ensure the .csv are as you expect re: data
> required?
>
> I just pushed my inital parse ofhttp://
clerk.house.gov/evs/2009/ROLL_000.asp
>
> At the moment I'm focusing on the parsing I will clean code ASAP.
>
> Thanks
>
> > On Sat, Aug 15, 2009 at 12:27 PM, bouvard <
staringmon...@gmail.com> wrote:
>
> >> Hey everyone, thanks for registering with this group! :-)
>
> >> This morning I committed code for a scheduler to handle periodically
> >> running the scrapers and also an example/tutorial scraper for people
> >> to use as a model for their own. For those of you who have
> >> contributed the Fifty States Project, this is all going to look _very_
> >> familiar. If anyone would like to take a crack at one of the
> >> calendars listed on the sources page, I have marked the one that I
> >> have completed so far:
>
> >>
http://wiki.github.com/bouvard/votersdaily/sources
>
> >> If anyone decides to look at the code or contribute I would really
> >> like to hear any feedback on what has been done so far: the database
> >> schema, scheduler, duplicate checking methodology, or anything else.
> >> At this early point I can not guarantee that the base EventScraper
> >> won't change, but hopefully we can identify anything that needs to be
> >> tweaked sooner rather than later.
>
> >> I'm hoping to write at least one more scraper this weekend, but I
> >> haven't decided which to work on yet. I'll mark in "In Progress" on
> >> the Sources page as soon as I decide.
>
> >> Thanks again for everyone's interest!
> >> Chris
>
> > --
> > Chauncey Thorn
> > PHP Developer/Systems Administrator
> > email:
chaunc...@gmail.com
> email:
chaunc...@gmail.com
> url:
http://www.cthorn.com/