We will need a "dictionary" to map WIDs to GUIDs

11 views
Skip to first unread message

Steve8x8

unread,
Sep 9, 2013, 4:58:57 AM9/9/13
to geo...@googlegroups.com
Apparently, GC has reduced access to GUIDs (which are their internal "numbers" for almost everything, caches, logs, users, etc.).
GUIDs are no longer present in most types of search result pages, but they are - by GeoToads design - essential to get direct access to "print with logs" pages (which are also, by design, the basic source of information).
While there are still means to get a mapping of WIDs to GUIDs, most of them are file operations accessing the "cache_details.aspx" which in a typical case is about 120~150 kB in size - just to get a single 36-character string (the GUID of the cache). This stresses the servers in an unnecessary way, and since we're operating at (or beyond) some limit already, has to be avoided if at all possible.
While I'm still working on a "cheap" way to get this mapping done by the server itself, there is another way to obtain this information for free, as it's already there.
What am I taling about? It's your local file cache(s). All the cdpf.aspx files that are already there (and overwritten on a regular basis) do contain the GUID (in their names), and the WID (right at the beginning).
To be prepared for the transition to be made, I'd like to ask you to run something like the extractor script here:

--- snip ---
#!/bin/bash
cd ~/.geotoad
(
echo '---'
find \
    ./cache/ \
    -name 'cdpf.aspx_guid_*' \
| while read file
do
    guid=`echo $file | sed 's~^.*guid_\([0-9a-f-]*\).*~\1~'`
    wid=`cat $file | grep pageTitle | head -n1 | sed 's~^.*(\(GC[0-9A-Z]*\)).*~\1~'`
    echo $wid: $guid
done
) \
| sort | uniq \
> mapping.yaml
--- snap ---

You may check your result by running 'irb', the interactive ruby interpreter, with the following lines:
--- snip ---
  require 'yaml'
  mapping = YAML::load(File.open('mapping.yaml'))
  mapping.length
--- snap ---
This should give you the number of different caches you have "mapped". (For me, as I have done lots of testing all over the world, it's a number close to 10,000. YMMV.)

What's this all good for?
An upcoming commit (or rather, series of commits, later these days) will back out the changes made to workaround the lack of GUIDs in search results, and instead try to get the mappings from the dictionary just created. Later additions will query GC for individual mappings, and add them to the dictionary, if the WID isn't known yet.
This will - for a while at least - only happen in the SVN branch (to it's going to be not as useless as expected now that trunk and branch have converged in the preparation of 3.18.0). Please watch that space, and give me feedback.

(If someone is willing to write a Windows tool to perform the extraction - you're more than welcome.)

As said before, hope isn't completely lost yet.
Take care,
 S

Steve8x8

unread,
Sep 9, 2013, 11:25:41 AM9/9/13
to geo...@googlegroups.com
Just a short remark: with the current code, and a non-wid/guid query, pre-filtering fails, but also the -X option doesn't bring you any further. Some values aren't properly parsed from the search page, obviously :(
While adding some conversions (.to_f, .to_i, ...) to lib/filter.rb (to be committed) at least removes the crashes, there are changes necessary to lib/search.rb - the workaround just ignored everything that came from the search itself and used the values from the cache_details page instead...
You've been warned!

Ben Mathews

unread,
Sep 10, 2013, 6:44:49 PM9/10/13
to geo...@googlegroups.com
Where is the mapping.yaml file to be placed?

Steve Sixty-Four

unread,
Sep 11, 2013, 4:37:28 AM9/11/13
to geo...@googlegroups.com

The file will go next to config.yaml, cookies.yaml, and history.yaml, and consist of
a single line with three dashes
a line "wid: guid" for each known wid-guid mapping

Reply all
Reply to author
Forward
0 new messages