The redesign of GC.com this timeintroduced changes in the format of
the "print view" (cdpf.aspx), in a way that breaks the parser. In
particular, fields have been broken up into multiple lines (as in
LatLong), and there have been field name changes (e.g. for the
Additional Hints).
I tried to cope with line fragmentation using the following patch:
--- details.rb.save 2010-10-04 15:42:10.000000000 +0200
+++ details.rb 2010-10-07 09:04:07.000000000 +0200
@@ -113,6 +113,9 @@
wid = nil
cache = nil
+ # condense <p class>
+ data.gsub!(/(\<p class=[^>]*\>)\s*/m, "&\\1");
+ # now analyze
data.split("\n").each { |line|
# <title id="pageTitle">(GC1145) Lake Crabtree computer
software store by darylb</title>
if line =~ /\<title.*\((GC\w+)\) (.*?) by (.*?)\</
which actually allows to read the location, but then geotoad throws an
error somewhere else:
(relevant command line parameters: -y 1 "N53 39.177 E011 21.837")
D: ====+ Fetch URL:
http://www.geocaching.com/seek/cdpf.aspx?lc=10&guid=7b986011-d12a-4076-823c-1cb9fb43646f
D: ====+ Fetch File: ~/.geotoad/cache/
www.geocaching.com/seek/
cdpf.aspx_lc_10_guid_7b986011-d12a-4076-823c-1cb9fb43646f
D: local cache is only 43048 old (518400), using local file.
D: 36222 bytes retrieved from local cache
D: wid = GC2FFCZ name=Lütt Schwerin creator=TeamStralenwurm
D: stype=multicache full_type=
D: parsing date: [10/06/2010]
D: Looks like a date: year=2010 month=10, date=06
D: Timestamp parsed as Wed Oct 06 00:00:00 +0200 2010
D: ctime=Wed Oct 06 00:00:00 +0200 2010 cdays=1
D: got written lat/lon
D: found size: Micro
/home/steffen/src/geotoad/lib/details.rb:269:in `parseCache':
undefined method `[]' for nil:NilClass (NoMethodError)
from ~/src/geotoad/lib/details.rb:65:in `fetch'
from ~/src/geotoad/geotoad.rb:405:in `fetchGeocaches'
from ~/src/geotoad/geotoad.rb:401:in `each_key'
from ~/src/geotoad/geotoad.rb:401:in `fetchGeocaches'
from ~/src/geotoad/geotoad.rb:612
Another file already in the local cache yields
D: ====+ Fetch File: ~/.geotoad/cache/
www.geocaching.com/seek/
cdpf.aspx_lc_10_guid_72651f68-6d46-4e1b-8982-c615615dc78a
D: local cache is only 422873 old (518400), using local file.
D: 51246 bytes retrieved from local cache
D: wid = GC1WAN3 name=Thorge - Mecklenburgische Seenplatte/
lakedistrict creator=jeromedax
D: stype=earthcache full_type=
D: parsing date: [07/24/2009]
D: Looks like a date: year=2009 month=07, date=24
D: Timestamp parsed as Fri Jul 24 00:00:00 +0200 2009
D: ctime=Fri Jul 24 00:00:00 +0200 2009 cdays=440
D: got written lat/lon
D: found size: Not chosen
D: difficulty: 3.5
D: terrain: 1.5
D: found short desc: [Ni ju san
[...]
- as one can see, terrain, difficulty, and descriptions are missing
(note: GC2FFCZ has no short desc).
The reason for the missing terrain and difficulty values is obviously
the split over _three_ lines - my patch only removes the single line
feed after the <p> tag.
It might help to condense _everything_ between <p class...> and </p>
onto a single line before running the data.split.each loop. But I
don't know how to do that :(
And issue 152 points towards the logs ... which are still different...
Of course another approach would be to abandon line parsing
completely... as it's already being done for hints.
Any ideas?