Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Caching HTTP proxy CGI module

1 view
Skip to first unread message

Gertjan van Oosten

unread,
Feb 9, 1994, 10:39:24 AM2/9/94
to
With the advent of Mosaic 2.2, I have hacked up a perl script that is a
basic HTTP caching proxy CGI module.

As http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/proxy-gateways.html
says:

"Now that we have CGI, hopefully someone will write a proxy gateway CGI
module, and then you can turn your favorite HTTP server into a proxy
gateway."

Well, here's one. Code below.

Its usage is simple: set application default Mosaic*httpProxy or environment
variable http_proxy to "http://yourserver:yourport//cgi-bin/htcache?",
[...//cgi-bin..., the double slash is necessary, glitch in Mosaic]
start Mosaic and off you go!

The approach is bare-bones (the cache just grows, trimming is not yet part
of the script), and I welcome any improvements or additions.
[E.g. the only date formats it recognises in the 'Last-modified:'-header are
"Thu Feb 3 17:03:55 1994 GMT" and "Tuesday, 08-Feb-94 14:15:29 GMT".]

Cheers,
--
-- Gertjan van Oosten, West Consulting bv
-- Estec, gvoo...@isosa1.estec.esa.nl, +31-1719-85668

#!/usr/local/bin/perl
#
# proxy script to retrieve files via HTTP 1.0 and store them in a cache
#
# Usage: htcache <httpurl>
#
# By Gertjan van Oosten, 940209
# Based on Martijn Koster's httpget, based on a script from Marc VanHeyningen

require "timelocal.pl";

# !!! Modify this to reflect correct location !!!
$cachedir = '/usr/local/etc/httpd/htcache';

$_ = shift;

s!^http://!!;
$urlish = $_;
($hp, $path) = split('/', $_, 2);
($host, $port) = split(':', $hp, 2);
$path =~ s/^/\//;
$port = 80 unless ($port);

$TIMEOUT = 10;
$SIG{ALRM} = Timer;

$out = "GET $path HTTP/1.0\r\n\r\n";
$result = &get($host, $port, $out);
print STDOUT $result;

sub get
{
local($host, $port, $out) = @_;

$AF_INET = 2;
$SOCK_STREAM = 1;

$sockaddr = 'S n a4 x8';

chop($thishost = `hostname`);
($name, $aliases, $proto) = getprotobyname("tcp");
($name, $aliases, $type, $len, $thisaddr) = gethostbyname($thishost);

($fqdn, $aliases, $type, $len, $thataddr) = gethostbyname($host);

die "httpget.pl: failed to resolve $host\n" if (!$thataddr);

$this = pack($sockaddr, $AF_INET, 0, $thisaddr);
$that = pack($sockaddr, $AF_INET, $port, $thataddr);

socket(FS, $AF_INET, $SOCK_STREAM, $proto) || die "socket: $!";
bind(FS, $this) || die "bind: $!";

alarm($TIMEOUT);
connect(FS, $that) || die "connect: $!";
alarm(0);

select(FS); $| = 1; alarm(360);
print $out;
while(<FS>)
{
last if (/^\s*$/);
# Get Last-modified:-header
$lastmod = $_ if (/^Last-modified: /);
$header .= $_;
}
undef($/);

# Translate URL to filename in cache
$cachename = "$urlish";
$cachename =~ s,/,#,g;
$cachename = "$cachedir/$cachename";

# Look up dates for remote document and document in cache
chop($lastmod);
$lastmod =~ s/^Last-modified: //;
$mtime = &datetomtime($lastmod);
$cachemtime = (stat($cachename))[9];

# If cached document up to date, use that instead
if ($cachemtime >= $mtime)
{
close(FS) || die "close: $!";
open(FS, "$cachename") || die "open: $!";
}

# Get the actual document
$result = <FS>;
close(FS) || die "close: $!";

# Update cache if necessary
if ($cachemtime < $mtime)
{
open(FS, ">$cachename") || die "open: $!";
print FS $result;
close(FS);
}

return $header . "\r\n" . $result;
}

sub Timer { die "timeout\n" }

sub datetomtime
# Usage: $mtime = &datetomtime("Tuesday, 08-Feb-94 14:15:29 GMT")
# or: $mtime = &datetomtime("Thu Feb 3 17:03:55 1994 GMT")
{
local($_) = @_;
local(%months) = ('JAN',0,'FEB',1,'MAR',2,'APR',3,'MAY',4,'JUN',5,
'JUL',6,'AUG',7,'SEP',8,'OCT',9,'NOV',10,'DEC',11);
local($day, $mn, $yr, $hr, $min, $sec, $mon, $rest);

# Split date string
local($wday, $date, $time, $tz, $rest) = split;

# Check which format
if ($rest !~ /^$/)
{
# Format: Thu Feb 3 17:03:55 1994 GMT
$day = $time;
$mn = $date;
$yr = $rest - 1900;
$time = $tz;
}
else
{
# Format: Tuesday, 08-Feb-94 14:15:29 GMT
($day, $mn, $yr) = split(/-/, $date);
}
($hr, $min, $sec) = split(/:/, $time);

# Translate month name to number
$mn =~ tr/a-z/A-Z/;
$mn = substr($mn,0,3);
$mon = $months{$mn};

# Translate to seconds since Epoch
return &timegm($sec,$min,$hr,$day,$mon,$yr);
}

Gertjan van Oosten

unread,
Feb 11, 1994, 9:44:01 AM2/11/94
to
I wrote:

> With the advent of Mosaic 2.2, I have hacked up a perl script that is a
> basic HTTP caching proxy CGI module.

and included a very basic perl script. Unexpectedly, I have had some time
to improve it a little, and put it up for ftp; you can get it at

ftp://ftp.estec.esa.nl/pub/iso/http/htcache.pl

The header of the file says:

#!/usr/local/bin/perl
#
# Description:


# Usage: htcache <httpurl>
#

# Proxy script to get an HTTP URL and cache it (if appropriate).
# To be installed in the HTTP server's cgi-bin; set the client
# property 'Mosaic*http_proxy' or the environment variable 'http_proxy'
# to 'http://wwwhost.domain:port//cgi-bin/htcache?'.
# [...//cgi-bin... -- double slash because of glitch in Mosaic]
#
# Processing:
# Gets the HEAD of the requested URL, extracts the 'Last-modified:'-
# header, compares the time to the time of the same document in the
# cache, if the cached document is out-of-date, reads the rest and
# stores it in the cache. It returns the cached document.
# Local documents and script output (method 'POST' or URL-paths
# beginning with '/cgi-bin/' or '/htbin/') aren't cached.
#
# Dependencies:
# NCSA Mosaic 2.2
# NCSA httpd 1.1
# timelocal.pl
#
# To do:
# - trim LRU from cache when defined threshold is exceeded
# - implement some sort of configuration file specifying which
# documents should never be cached, which documents always, etc.
#
# Author:
# Gertjan van Oosten, gvoo...@isosa1.estec.esa.nl or ger...@west.nl
#
# Based on Martijn Koster's httpget, which in turn is based on a script
# from Marc VanHeyningen.
#
# Revision history:
# $Log: htcache.pl,v $
# Revision 1.2 1994/02/11 12:02:52 gvoosten
# Added support for method 'POST'.
# No longer caches local documents or script output.
#
# Revision 1.1 1994/02/11 11:58:53 gvoosten
# Initial revision
#
# Version:
# @(#)$Id: htcache.pl,v 1.2 1994/02/11 12:02:52 gvoosten Exp $
#

As always, additions, improvements or just comments welcome.

0 new messages