configuring flip_in_mem_edsm_server

93 views
Skip to first unread message

Jon Leighton

unread,
Apr 6, 2010, 9:22:47 AM4/6/10
to spdy-dev
I've built the flip_in_mem_edsm_server on linux, and I currently have
FLAGS_use_ssl=false and FLAGS_need_to_encode_url=false. I'm trying to
use the flip server to serve a simple index.html file and I'm having
trouble. When I start up the server, it seems to ignore index.html
and complains about any other files I have in out/Debug/GET_/, and
when I connect to the server with chrome on port 16002 it says the
link appears to be broken. How do I serve a simple index.html file?
Thanks.

Mike Belshe

unread,
Apr 6, 2010, 11:35:00 AM4/6/10
to spdy...@googlegroups.com
Hi, Jon,

This can be a little tricky.

First, the file format is:
    HttpHeaders\r\n
    Content

It is designed to use data directly as you might have recorded it off of a web server.

The file system hierarchy is:

    GET_/<domain-of-url>/<encoded-url>

So, to find "http://www.google.com/index.html", you'll need the file at


The filename encoding is complicated; basically it is annoying to get these files mirrored on windows clients, and the encoding is overly conservative to support windows.  (If anyone at Microsoft is listening, it would be nice to have >260 character filenames that can be manipulated through the shell, thanks!)

But all this sounds like a pain, right?  So what are your options?

The encoding of filenames is located in url_to_filename_encoder.h.  You can modify that or use it to figure out how to encode filenames.

Or, you can record content through Chromium (it records all JS/CSS/HTML/etc, and creates a pretty good simulation of any given page), and then dump the Chromium cache directly into files of this format.

To do this:
    a) Get the chromium source for windows (sorry on the windows requirement, it is only needed to record & dump)
    b) Modify the file chrome\common\chrome_constants.cc - change the line which reads "kRecordModeEnabled = false" to "kRecordModeEnabled = true".
    b) Build Chrome.
    c) Run Chrome with chrome.exe --record-mode --no-js-randomness --user-data-dir=<path to some empty directory where we'll store the browser cache>
    d) Maximize the browser window as large as it can go
    e) Browse through a set of pages you'd like to record & playback.  Do this slowly so that all content loads fully.  Avoid visiting the same page twice if you can.
    f) Close the browser
    g) In your chromium build, you'll have a tool called dump_cache.exe.  This tool can dump the cache into the files suitable for the in-memory-server
    h) Run dump_cache.exe --input=<your user-data-dir>\Default\Cache --output=<path to where you want these files to go> --dump-to-files
    g) Copy these files to the server

I know, it is complicated. 

Mike





--
To unsubscribe, reply using "remove me" as the subject.

binoykunhi

unread,
Apr 6, 2010, 5:33:42 PM4/6/10
to spdy-dev
Even i am trying to do something similar, you cant simply call the
index.html file. You need to create "GET_" folder where the binary is
present and store the files in HTTP format (saving headers) using WGET
or chrome in record mode

Jon Leighton

unread,
Apr 26, 2010, 6:21:07 PM4/26/10
to spdy-dev
Thanks very much for the detailed instructions Mike. I've built the
Windows version of Chrome and run it in record mode as described, but
I'm running into a problem with dump_cache.exe. Chrome is writing
data into my user-data-dir, but when I run dump_cache.exe as described
below it complains about name conflicts and skips all the entries for
the sites I've visited, and my output directory remains empty. Do you
have idea what I'm doing wrong?

>     g) In your chromium build, you'll have a tool called dump_cache.exe.
>  This tool can dump the cache into the files suitable for the
> in-memory-server
>     h) Run dump_cache.exe --input=<your user-data-dir>\Default\Cache
> --output=<path to where you want these files to go> --dump-to-files
>     g) Copy these files to the server


--
Subscription settings: http://groups.google.com/group/spdy-dev/subscribe?hl=en

Mike Belshe

unread,
Apr 26, 2010, 6:41:16 PM4/26/10
to spdy...@googlegroups.com
I just checked in a fix for this which I had pending for about a week.

if you can grab the latest sources in net/tools, I hope the problem disappears.

Sorry about that

Mike

Jon Leighton

unread,
Apr 27, 2010, 7:45:22 PM4/27/10
to spdy-dev
It looked like your fix was just a couple of new lines of code in
cache_dumper.cc so I manually made the changes and rebuilt chrome, but
I still got the name_conflict error. I then tried a gclient sync and
rebuilt chrome again, but I'm still getting the same error. Is there
something else I need to do? Thanks.

- Jon

Mike Belshe

unread,
Apr 27, 2010, 8:00:38 PM4/27/10
to spdy...@googlegroups.com
Can you send the exact cmdline you are using?

thanks,
Mike

Jon Leighton

unread,
Apr 27, 2010, 9:34:07 PM4/27/10
to spdy-dev
I'm running chrome from the src\chrome\Debug directory with

> chrome.exe --record-mode --no-js-randomness --user-data-dir=cached_files

where cached_files is a directory I created in src\chrome\Debug. I'm
running dump_cache from the src\chrome\Debug directory with

> dump_cache.exe --input=cached_files\Default\Cache --output=encoded_cached_files --dump-to-files

where encoded_cached_files is also a directory I created in src\chrome
\Debug. The errors I get look like this

Skipping entry "0GEThttp://www.cis.udel.edu/~leighton/" (name
conflict!)

- Jon

Mike Belshe

unread,
Apr 27, 2010, 10:01:59 PM4/27/10
to spdy...@googlegroups.com
Sorry - use an absolute pathname to the output directory for --output=<your absolute path here>  and then I think it will work.

Cause:
Microsoft long-file-names are the biggest piece of mentally retarded code ever created.  It's so broken that Microsoft's own Explorer doesn't support them (create a long filename and try to delete it using the shell - you can't!)  When I wrote this code, I went through so many hoops just to create the long-filenames (pre-pending the "\\?\", switching APIs, and other absolute garbage) that I never made dump_cache support relative-filenames (is there even a way to do that?).  Yeah yeah - call me a lazy programmer...  then go look at this API and tell me if you still think so!  (Can you believe it is 2010 and Windows7 still can't open a file with >260 characters of length?)

Ok.  I feel better.  After that public rant, I hope my diagnosis is the same problem you're seeing! :-)

Mike

Jon Leighton

unread,
Apr 27, 2010, 10:58:20 PM4/27/10
to spdy-dev
Thanks very much Mike - that did the trick. I was able to create
cache files and serve them with the flip-in-mem-edsm-server.

- Jon

Jon Leighton

unread,
Jun 10, 2010, 2:18:46 PM6/10/10
to spdy-dev
I'm running into a problem with the flip server not being able to find
images. For example, I've gone through the process of creating the
cache files for the site buzet.pc.cis. The resulting buzet.pc.cis/
indexx2Ehtml file contains a reference to the file sat-
imgs-4*4/11.jpeg, and in the buzet.pc.cis/sat-imgs-4x2A4 directory
there is a file called 11x2Ejpeg, but the flip server reports the
error:

[0610/140503:ERROR:net/tools/flip_server/
flip_in_mem_edsm_server.cc(675)] Could not find file data for GET_/
buzet.pc.cis/sat-imgs-4*4/11.jpeg

Does this indicate a problem with the encoded cache files, or do I
perhaps have something configured incorrectly? Thanks for any help.

- Jon

Roberto Peon

unread,
Jun 10, 2010, 2:39:10 PM6/10/10
to spdy...@googlegroups.com
Can it find other files? Is it just not able to see that one?
-=R

Jon Leighton

unread,
Jun 10, 2010, 3:07:59 PM6/10/10
to spdy-dev
In this case there are 16 jpegs in the buzet.pc.cis/sat-imgs-4x2A4
directory, and a faviconx2Eico image in the buzet.pc.cis directory.
The flip server complains that it can't find any of them. There are
no other resources for this page other than the indexx2Ehtml file,
which is does find. I should mention that the cache files were
created using HTTP rather than SPDY, and I'm pointing chrome to
127.0.0.1:16002/buzet.pc.cis/indexx2Ehtml to retrieve them from the
flip server's GET_ directory.
- Jon

Mike Belshe

unread,
Jun 10, 2010, 3:22:04 PM6/10/10
to spdy...@googlegroups.com
This might work; but doesn't sound right on the surface.

The point of the url-rewriter is so that we can save files on disk which can be looked up by original URL.  So, the browser should only know the original URL, which would be something like:
In this way, the protocol is simulating *exactly* what we'd send in the real world.

But the flip-in-memory-server will take that URL, encode it using the rules, and realize that is a file on disk called "GET_/buzet.pc.cis/indexx2Ehtml"

Mike

Roberto Peon

unread,
Jun 10, 2010, 3:23:39 PM6/10/10
to spdy...@googlegroups.com
Something seems strange with the encoding.
You have 16 images here:
  buzet.pc.cis/sat-imgs-4x2A4
But the server is telling you it is trying to look up stuff here:
  buzet.pc.cis/sat-imgs-4*4/

Is the server compiled with filename encoding turned off?

-=R

Roberto Peon

unread,
Jun 10, 2010, 3:25:34 PM6/10/10
to spdy...@googlegroups.com
I think you must. When you re-compile with encoding enabled, run the server again.
Next time, hit it with:
  buzet.pc.cis/index.html
and the server will encode that to 
  buzet.pc.cis/indexx2Ehtml
before it attempts to lookup that resource from the cache/disk, etc.
-=R

Jon Leighton

unread,
Jun 10, 2010, 8:11:41 PM6/10/10
to spdy-dev
Thanks Roberto, as you expected I did have flip compiled with URL
encoding off - sorry about that! Now, when I point Chrome to
127.0.0.1:16002/buzet.pc.cis/ flip looks for GET_//buzetx2Epcx2Ecis/
indexx2Ehtml.

For some reason dump_cache.exe is not encoding the top level directory
names for me, though it is encoding everything under the top level. I
renamed the buzet.pc.cis directory to buzetx2Epcx2Ehtml and restarted
the flip server, but flip still couldn't find it. Flip is looking for
GET_//buzetx2Epcx2Ecis rather than GET_/buzetx2Epcx2Ecis and I'm
guessing the extra "/" is a problem - but I don't know why it's there.

I restarted Chrome with the --host-resolver-rules option and that
solved (worked around?) the problem. I think I'm all set except for
dump_cache.exe not encoding top level directory names - should it be
doing that?

- Jon

Roberto Peon

unread,
Jun 10, 2010, 8:18:10 PM6/10/10
to spdy...@googlegroups.com
Yea, the hostname is part of the encoded file name. That means that when you access the file via chrome (or whatever), you need to have the GET sent with the correct host.
If you use the --host-resolver-rules, that tells chrome to ignore the hostname in the URL for the purpose of DNS resolution for your connection... which allows you to easily preserve the hostname in your requests.

Does that make sense (or was it redundant? :) )
-=R

Jon Leighton

unread,
Jun 10, 2010, 9:16:27 PM6/10/10
to spdy-dev
Sorry Roberto - I don't think I'm following you. Isn't the hostname
only seen by chrome and not flip? I was assuming that the hostname
was simply used by Chrome to establish a connection over which to send
the GETs, so I don't understand in what way it is part of the encoded
filenames used by flip. When I use the --host-resolver-rule I think
Chrome is just substituting the host:port pair with whatever I
specified in the rule, prior to establishing a connection, but I still
don't see where encoding the hostname would come in. Can you explain
further? Thanks.
- Jon

Robert Massa

unread,
Jun 13, 2010, 10:12:48 AM6/13/10
to spdy...@googlegroups.com
I think I'm running into a similar issue here. I am using the
host-resolver-rule flag, and this works perfectly for SPDY requests.
For normal HTTP requests I get the following though:

Could not find file data for GET_//indexx2Ehtml

So it looks like flip isn't using the hostname in the path. This is my
client command for HTTP:

chrome-win32\chrome.exe --host-resolver-rules="MAP * 192.168.1.116"
http://news.google.com:16002

This is my command against the same flip server, but using SPDY:

chrome-win32\chrome.exe --use-spdy=no-ssl --host-resolver-rules="MAP *
192.168.1.116" http://news.google.com:10040

I tried this against a server compiled with and without SSL support,
both gives me same result; SPDY works, HTTP end up on the wrong path.

Like Jon, I'm also not sure how flip server should know the correct
path wen using HTTP. I "trick" it into getting the right files by
encoding the host and appending this to the request, but that doesn't
seem right.

What is the right way of connect to flip over HTTP?

Roberto Peon

unread,
Jun 13, 2010, 1:48:20 PM6/13/10
to spdy...@googlegroups.com
Both HTTP and SPDY would send both the path and the hostname.
When you type the resource name into chrome, you need to use the proper hostname, e.g.

This would result in the following request being sent (at minimum) for HTTP (SPDY is very similar):
  GET /index.html HTTP/1.1

Just typing:
  /index.html
results in a different query


So, in other words, after setting up the host-resolver-rule, you need to type in the proper hostname in order to access the content properly (if you think about it, it makes sense-- a single IP typicaly serves several hostnames and the content for each is different. Thus, hostname must be included in the request to specify which content it should show).

If you do that and it doesn't work, there is probably a bug..
-=R

Robert Massa

unread,
Jun 13, 2010, 4:19:45 PM6/13/10
to spdy...@googlegroups.com
I digged a little deeper and I think it's a bug in the url_utilities
file. Change line 29 to:

return std::string(url, b, next_colon - b);

I'm not sure if this is right in all cases, might want to unit test this.

This makes the GetUrlHost actually return the host instead of just the
port-number.

Hope this helps...

Reply all
Reply to author
Forward
0 new messages