Making Wrong Code Type Wrong

Yuval Kogman

unread,

Oct 18, 2005, 2:38:47 PM10/18/05

to perl6-l...@perl.org

JoelOnSoftware wrote an article I recently saw linked on perlmonks:

http://www.joelonsoftware.com/articles/Wrong.html

The article discusses writing robust software, specifically by
dealing with data separation.

In my interpretation the article introduces a type system. This type
system helps write robust software, but has some limitations:

* Type information is checked by the programmer
* Full annotations must be supplied by the programmer
* Lack of annotation is hard to detect

The system helps you separate data that has not been massaged for
a certain piece of code, from touching that code. The only way to
let that data reach the code is by using a filter that sanitizes it.

Joel uses 'Request("Foo")' to mean something akin to
$q->param("Foo") in CGI.pm land, and Write like 'print' (assuming an
HTML output).

His example shows how cross site scripting can arise, and how to use
the type system to avoid this problem.

The type system is implemented using coding standards: you tag
variable names, much like a tagged union. In his example, the union
type discusses data safety, and has two subtypes: safe and unsafe.

This relates very closely to tainting, but differs in one respect -
it's a static analysis. Tainting does the same thing with no user
annotation, at runtime, under very specific situation.

Perl 6 will need support for this kind of tainting, and I raised it
before, but now I would like to propose something else.

Let's look at Joel's code for a second:

us = UsRequest("name")
usName = us
recordset("usName") = usName
sName = SFromUs(recordset("usName"))
WriteS sName

At the top, the 'us' annotations denote that Request will return an
unsafe value, and 'us' is an unsafe value. Then 'usName' is assigned
to it (in a far away piece of code, btw). The programmer knows that
'usName' cannot be named 'sName' because it's getting it's value
from a variable that is also tagged with 'us'.

Later, the value is stored in a DB. When extracted from the DB, we
know the value is unsafe, because it is tagged as such. Then SFromUS
is like a complex casting operator, that makes something unsafe into
something safe. The naming convention is supposed to help the
programmer *see* when things go wrong.

In Perl 6 ideally this would look like this, IMHO:

my $str = $q.param("name");
...
my $name = $str;
$storage.store("name", $name);
...
my $name = $storage.get("name");
print encode($name);

because type annotation sucks. Superficially, this code does not
have the property that both Joel and I want it to have - safety, but
I think this can be resolved.

Perl 6 has the notion of roles.

Let's say we were to decorate the param method of the http request
object, asking for a symbolic role to be attached to all the values
it returns.

What we want to get out of it is that in the scope of our code (the
lexical scope, the current class and it's subclasses, the consumers
of this module, etc etc), any retrieval of a param will tag the data
as unsafe, without param even knowing about this.

Then the view is also tagged - no data may enter the Template
namespace with this tag, or even more analy, for the scope that we
use Template, the only data we allow ourselves to put into it, is
something that is explicitly tagged as safe.

The implementation of this system is trivial with Perl 6's tools:
roles and compile time type inferrence allow the user to make a
system that gives the exact same features as Joel's system does by
wrapping interfaces.

However, what I'm more interested in is decorating existing
interfaces, in a limited scope.

The reason we want a limiting scope is that it is not our concern
how other pieces of code use $q.param safely or unsafely, with our
definition of safety or with someone else's definition of it.

What I'd like to be able to do is declare something that applies to
all code in my system (application, module, script, whatever) that
does this:

my $str = $q.param("name");
...
my $name = $str;
$storage.store("name", $name);
...
my $name = $storage.get("name");
print encode($name);

and enables me to say that

print $name;

is disallowed using the following rules:

everything from $q.param is also of the type Unsafe

everything going into $storage.store needs to get a callback
triggered if it us unsafe (and more data about it will be stored
in the DB).

everything coming out of $storage.get must also trigger a
callback, that will retag it as necessary.

everything going into print must be of the type Safe

the function encode has the type Unsafe -> Safe

Using these 5 rules I can then gain control over much larger bits of
code. The only question left unanswered is how do I say what code,
and what is the syntax for these decorations.

This tagging gets very interesting with his examples later on.
Here's an excert of Joel's article:

In Excel's source code you see a lot of rw and col and when you see those
you know that they refer to rows and columns. Yep, they're both integers,
but it never makes sense to assign between them.

There is a real benefit to be gained here, but the usability of e.g. int
formatting functions should not be hindered by overzealous typing.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me has realultimatepower.net: neeyah!!!!!!!!!!!!

Juerd

unread,

Oct 18, 2005, 3:04:02 PM10/18/05

to perl6-l...@perl.org

Yuval Kogman skribis 2005-10-18 20:38 (+0200):

> the function encode has the type Unsafe -> Safe

I read the article before. What occurred to me then did so again now.
What exactly do Unsafe and Safe mean? Safe for *what*?

Something that is safe to put in HTML may be unsafe to put in an rfc822
header, and what may be safe there is likely to be unsafe in a shell
command line.

Instead of Safe and Unsafe, I suggest using safe::html, safe::rfc822,
safe::bash, etcetera instead of Safe, and nothing instead of Unsafe. If
it's not safe::($usage), then it's unsafe. Just like how something that
isn't defined() is undef, without there being any need for an
undefined() test.

One problem still is that once something is encoded, quoted or escaped
it can't always be easily re-encoded. Encoding functions should therefor
check if a variable does safe::(none()) and warn or fail if so.

I used lc class names, because they're empty roles, used only for
decoration and does-testing, and has no methods. I've thought about
suggesting such a convention, and this, I guess, is as good a time as
any.

Another possibility is to use Str types, and coercion for encoding. In
that case I suggest the "lit" operator that provides Str::Literal
context&coercion, which coerces to any other string type without
encoding.

Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Yuval Kogman

unread,

Oct 18, 2005, 3:22:37 PM10/18/05

to Juerd, perl6-l...@perl.org

On Tue, Oct 18, 2005 at 21:04:02 +0200, Juerd wrote:
> Yuval Kogman skribis 2005-10-18 20:38 (+0200):
> > the function encode has the type Unsafe -> Safe
>
> I read the article before. What occurred to me then did so again now.
> What exactly do Unsafe and Safe mean? Safe for *what*?

That was just a naive example - the words "Unsafe" and "Safe" are
user defined, and are chosen on a case by case basis in their app.

> One problem still is that once something is encoded, quoted or escaped
> it can't always be easily re-encoded. Encoding functions should therefor
> check if a variable does safe::(none()) and warn or fail if so.

I don't see how this relates to the OP, or why encoding functions
should implement it like this.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me sneaks up from another MIME part: neeyah!!!!!

Juerd

unread,

Oct 18, 2005, 3:43:57 PM10/18/05

to perl6-l...@perl.org

Yuval Kogman skribis 2005-10-18 21:22 (+0200):

> > I read the article before. What occurred to me then did so again now.
> > What exactly do Unsafe and Safe mean? Safe for *what*?
> That was just a naive example - the words "Unsafe" and "Safe" are
> user defined, and are chosen on a case by case basis in their app.

I think there's a lot to be gained by implementing something like this
globally, consistently. CPAN is part of Perl, as far as I'm concerned.

> > One problem still is that once something is encoded, quoted or escaped
> > it can't always be easily re-encoded. Encoding functions should therefor
> > check if a variable does safe::(none()) and warn or fail if so.
> I don't see how this relates to the OP, or why encoding functions
> should implement it like this.

The "should" is not to be taken literally, and applies only to the
described hypothetical universe.

Rob Kinyon

unread,

Oct 18, 2005, 4:50:08 PM10/18/05

to Yuval Kogman, perl6-l...@perl.org

[snip]

Let me rephrase to see if I understand you - you like the fact that
boxed types + roles applied to those types + compile-time type
checking/inference allows you to tag a piece of information (int,
char, string, obj, whatever) with arbitrary metadata. Add that to the
fact that you can lexically mark certain function signatures as
checking against said arbitary metadata and you can provide
taint-checking to an arbitrary complexity.

Yeah, that's cool. :-)

Rob

Yuval Kogman

unread,

Oct 18, 2005, 8:48:05 PM10/18/05

to Juerd, perl6-l...@perl.org

On Tue, Oct 18, 2005 at 21:43:57 +0200, Juerd wrote:
> > That was just a naive example - the words "Unsafe" and "Safe" are
> > user defined, and are chosen on a case by case basis in their app.
>
> I think there's a lot to be gained by implementing something like this
> globally, consistently. CPAN is part of Perl, as far as I'm concerned.

While I agree that there is something to be gained from
semi-standard roles that allow modules to share compatible
interfaces (for example, imagine that Storable, Data::Dumper both do
the Serializable role, which is an interface spec jointly maintained
by their authors), I think that the power of the paradgim I proposed
is actually in non-shared code - things that apply to your app, and
are hard to reuse except for similar deployments.

The reason for my opinion is while an HTML sanitizer knows that it
takes any arbitrary string, and returns a string that has no
dangerous tags, and will not mess with the structure of the
document, it doesn't know what is the origin or your data, or what
is the destination of it's output.

This amendment to the type system is supposed to help you make sure
your glue code is glueing the right parts together, and while
components are generally reusable, composed components are scarcely
so.

> > I don't see how this relates to the OP, or why encoding functions
> > should implement it like this.
>
> The "should" is not to be taken literally, and applies only to the
> described hypothetical universe.

Huh?

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me does a karate-chop-flip: neeyah!!!!!!!!!!!!!!

Yuval Kogman

unread,

Oct 18, 2005, 8:55:39 PM10/18/05

to Juerd, perl6-l...@perl.org

On Wed, Oct 19, 2005 at 02:48:05 +0200, Yuval Kogman wrote:

> the Serializable role, which is an interface spec jointly maintained

Err, I meant the Serializer role... The Serializable role is a role
that takes a delegate that does Serializer, and lets the object that
does it be frozen and thawed.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me tips over a cow: neeyah!!!!!!!!!!!!!!!!!!!!!!