Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A_Modest_1_bit_Proposal_about_Quotification_-_making_the_Default_Easy

3 views
Skip to first unread message

Andy "Krazy" Glew

unread,
Dec 8, 2011, 10:26:38 AM12/8/11
to
Listening to an old "Security Now" podcast while doing my morning stretches.

Leo Laporte's TWIT website was hacked, and Steve Gibson, the Security
Guy, says "Any time you are soliciting user input, there is a risk of
malicious input somehow tricking the backend and executing that input,
when it is meant to be, you know, benign [input data, like] user name
and password.".

This is typical of the classic SQL injection hack, and, indeed, of any
hack where the attacker is able to inject scripting code and fool the
backend into executing it. Typically by embedding quotes or the like in
the input string.

(For that matter, Steve's description also applies to binary injection
via buffer overflow. But we won't go there; this page will talk only
about non-buffer-overflow attacks, sijnce we have elsewhere described
our agenda for preventing buffer overflow attacks.)

Say that you are talking user input like NAME, and are somehow using it
to create an SQL or other language command, like "SELECT FIELDLIST FROM
TABLE WHERE NAME = '$NAME' ". But now the attacker, instead of
providing a nicely formed string like "John Doe", provides instead
something like "anything' OR 'x' = 'x ". (I added spaces between the
single and double quotes for readability.) I.e. the user provides a
string that contains quotes in the target language - not the language
where the query string is composed, but a language further along. So
the query string becomes "SELECT FIELDLIST FROM TABLE WHERE NAME =
'anything' OR 'x' = 'x' ". And now the query matches any row in the
table. (http://www.unixwiz.net/techtips/sql-injection.html provides
examples, as does wikip[edia.).

The general solution to this is "quotification": take the user input,
and either delete or quote anything that looks like a quote in the
target language:. E.g. transform the attacker's string "anything' OR 'x'
= 'x " into either "anything OR x = x " or "anything\' OR \'x\' = \'x ".

The problem with deleting stuff from the user string is that sometimes
the user is supposed to have quotelike things. Consider names like
"O'Toole". Or consider prioviding, e.g. via cut and paste, Chinese
unicode names in an application whose original programmer was English,
but where the system is otherwise capable of displaying Chinese. It is
a pity if the barrier to internationalizaion is the "security" code
scattered throughout your application that santizes user input. Worse,
that is the sort of code that might get fixed by somebody who fixing
internationalization problems who doesn't understand the security issues

The problem with quotifiying stuff is that it is hard. It is not just a
case, for you Perl afficionadoes, of doing s/'/\/g - what about input
strings that already have \\' inside them? And so on.

But the real problem, applicable to both deleting and quotification
strategies, is that the code doing the user input sanitization does not
necessarily know the syntax of all of the languages downstream. It may
know that there is SQL along the way - but it may not know that somebody
has just added a special filter that looks for French quotes, <<
and >>. Etc. Not just special symbols: I have defined
sublanguages where QuOtE and EnDqUoTe were the quotes.

The security code may know the syntax at the time the sanitization code
was written. But the downstream processing may have changed. The
syntax of the language may have been extended, in a new revision of the
SQL or Perl or ... . (I found a bug like that last year.)

The problem is that the user input santization code is trying to
transform user input from strings that may be unsafe, to strings that
are guaranteed to be safe forever and ever, no matter what revisions are
made to the language, etc. The problem is that the default for
character strings is that ANY CHARCATER MAY BE PART OF A COMMAND unless
specially quoted.

We need to change this default. Here is my moldest proposal:

Let us define a character set whereby there is a special bit free in all
characters. And whereby, if that special bit is set, it is guaranteed
by ANY REASONABLE LANGUAGE that no character with that special bit set
will be part of any command or language syntax like a quote symbol.

We should strongly suggest, that the visual display for the characters
with and without the special bit set is the same. Or at least, the same
in most situations - in others you may want to distinguish them, e.g.,
by shading.
.
If you are using something like BNF to describe your language, then it
might be:

ORDINARY_CHARACTER ::== 'A' | 'B' | ...

TAINTED_CHARACTER ::== 1'A' | 1'B' | ...
POSSIBLY_TAINTED_CHARACTER ::= ORDINARY_CHARACTER | TAINTED_CHARACTER


where I am using the syntax 1'A' to describe a single character literal.
with the special bit set.

STRING_LITERAL := QUOTED_STRING | TAINTED_STRING
TAINTED_STRING ::= TAINTED_CHARACTER+


QUOTED_STRING ::= " CHARACTER* "



(Actually, I am not sure whether a quoted string should be the abnove, or
QUOTED_STRING ::= " POSSIBLY_TAINTED_CHARACTER* "
)


And we require that the only place where the possibly tainted characters
with the tainted bit set are ONLY permitted in strings. Nowhere else in
the language. Not in keywords, symbols, operators....



Then we just have to ensure that all of our input routines set the
special bit. If you really need to form operators, the programmer can
untaint the data expliocitly. Btter to have to untaint explicitly in a
few p[laces, than to have to quotify correctly in all places.



Perhaps better to make taintimg the default. To flip the polarity of
the special bit. And to require that language syntax, keywords, etcv.,
be set only if the special bit is set.




This is just the well known taint or poison propagation strategy.
Exposed to programming language syntax definitions.

I have elsewhere espoused taking advantage of extensible XML syntax for
programming languages. This is similar, although orthogonal.



wi8ki'ed at
http://wiki.andy.glew.ca/wiki/A_Modest_1_bit_Proposal_about_Quotification_-_making_the_Default_Easy
as well as posted on my blog

Robert Wessel

unread,
Dec 8, 2011, 2:09:25 PM12/8/11
to
Quotification is almost always the wrong way to deal with the problem.
It is, far to commonly, a hack done to address the issue. It's far
too easy to leave holes as you describe, or generate false positives
(a site I know had a filter that looked for things *looking* like
fragments of SQL - which caused occasional ordinary English text to
get filtered if it met the right patterns).

Fixing the code to use proper parameters, which are not subject to
quoting weirdness from maliciously constructed input strings, rather
than trying to filter the input for malicious SQL, is the correct
solution.

See:

http://www.owasp.org/index.php/SQL_Injection

The syntax varies from language to language, and the follow Java
example is modified from the OWASP site:


// Unsafe
String query = "SELECT account_balance F.ROM user_data WHERE
user_name = '" + custname + "'";
ResultSet results = statement.e.xecuteQuery( query );

The above is unsafe because the values are being (badly) cooked into
string constants. Consider what happens if custname contains
(approximately) "x'; DROP TABLE xyz;". That turns the command string
passed to the SQL engine into:

"SELECT account_balance F.ROM user_data WHERE user_name = 'x'; D
which is then executed. The problem with that is obvious.


// Safe:
String query = "SELECT account_balance F.ROM user_data W.HERE
user_name = ? ";
PreparedStatement pstmt = connection.prepareStatement( query );
pstmt.setString( 1, custname);
ResultSet results = pstmt.executeQuery( );

This passes the parameters directly to the SQL engine, where they are
treated as whole parameters, not literals in the query string. This
eliminates the problem.

There are other solutions as well, but proper parameterization is
generally the best solution.

For buffer overflows, the correct solution is to fix the program or
language to not have buffer overflows.

In some cases it may be a bit difficult to get all the parts to
cooperate in a clean way, so most web servers to a serious job
sanitizing paths to prevent directory traversals. This is a
particular issue with web servers, since a chunk of security is
handled by the web servers (this is more a comment on the security
architecture of the underlying systems or lack of understand of such
by the authors of the web servers), but proper permissions on the
directories goes a long way. Also, there's often an OS function to
cook the name down (GetFullPath(), in Windows, for example), and that
file name should really be used for any additional validation. Or
other mechanisms that let you separate the authorization of different
parts of the web server (in Windows you can query the security system
explicity before accessing objects, or use impersonation to
temporarily change the thread's security context). That being said,
directory name syntax is usually simpler than SQL syntax, but clearly
easy to get wrong anyway.

And at this point in time, how in the heck are we going to get
everyone to agree on an explicit quoting scheme?

Terje Mathisen

unread,
Dec 8, 2011, 4:40:19 PM12/8/11
to
You had me until this point Andy, that's a pretty good explanation of
SQL injection.
>
> The general solution to this is "quotification": take the user input,

And here is where you go wrong:

The general solution is to totally separate parsing from user input,
i.e. in your example above you would first parse the SELECT statement,
using question marks as placeholders for where you expect input.

Later on you execute that prepared (i.e. parsed) statement, substituting
the actual user input for the placeholders:

I.e. in perl this looks like this:

# Let the DB parser see only static strings like this:
my $sth =
$dbh->prepare("SELECT FIELDLIST FROM TABLE WHERE NAME = '?'");

# Get the possibly poisonous user input
my $user_input = param('name');
$sth->execute($user_input);

[snip]
> Perhaps better to make taintimg the default. To flip the polarity of the
> special bit. And to require that language syntax, keywords, etcv., be
> set only if the special bit is set.

Perl actually has 'taint' as a builtin feature. :-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Message has been deleted

Andy "Krazy" Glew

unread,
Dec 9, 2011, 11:49:06 AM12/9/11
to

>>
>> The general solution to this is "quotification": take the user input,
>
> And here is where you go wrong:
>
> The general solution is to totally separate parsing from user input,
> i.e. in your example above you would first parse the SELECT statement,
> using question marks as placeholders for where you expect input.
>
> Later on you execute that prepared (i.e. parsed) statement, substituting
> the actual user input for the placeholders:
>
> I.e. in perl this looks like this:
>
> # Let the DB parser see only static strings like this:
> my $sth =
> $dbh->prepare("SELECT FIELDLIST FROM TABLE WHERE NAME = '?'");
>
> # Get the possibly poisonous user input
> my $user_input = param('name');
> $sth->execute($user_input);


Sure.

But let me point out

a) SQL parameterization is just a form of quotification. It is a form
of quotification that has very complicated quote marks, with a level of
indirection.

b) SQL parameterization is only available in SQL. The example you give,
Terje, is not Perl - it is Perl being used to prepare an SQL statement.

How does this help me if I am using some other language, that does not
have that facility?

SQL injection is not the only form of bad quotification attack.

Yesterday I was writing some scripts - Perl driving Expect driving csh
driving ... heck, I don't know half of what it was driving, since
hardware CAD systems are scripts wrapped around scripts wrapped around...

I did something as simple as, in Perl

system("grep $pattern $filename | $user_provided_filter_command")

Many opportunities for shell script injection here.

Yes, I am not even sure how the tainting approach would work here. I
think that $user_provided_shell_command might be restricted to standard
commands available on the target system, default path, not able to
invoke arbitrary user code. And constrained not to have shell syntax
like < or > or ... inside it. (Might want to have capabilities
attachached to string inlining.)

Yes, perhaps I should be using exec and not system around strings. I
suppose that exec, which is not subject to shell parsing, is the
UNIX/shell equivalent

And, yes, supposedly what I am writing is internal only, never supposed
to be exposed to anyone untrusted.

But these are the attitudes that get us in trouble. When it is more
work to be secure than it is to be insecure. When we are willing to
write insecure code because it is easier, and because we fool ourselves
into thinking we will never be exposed.

Sure, use SQL paramters.

But more and more I reject any "solution" that says

a) Every programmer should use feature F in language L.

b) Ever programmer should be careful.

I look for techniques that apply cross languages, and cross programmers
of varying levels of skill.


===


By the way: a few years ago one of my favorite wikis, Twiki, was
compromised because it had a search facility that basically did

system("find wiki -type f | grep $user_provided_regexp")

The fix? Seems to be rewriting grep in Perl.

I object to security fixes that get in the way of code reuse.

Another possible fix, the moral equivalent of SQL ? parameterization:

$file->write($user_provided_regexp);
system("find wiki -type f"
. "| grep -regexp_file $file->{name}");

Basically, to provide the moral equivalent of SQL ?,
you need to provide -file arguments to all scripts that you want to
interpolate user data into. Lots of code to change.

I am using a tool a friend of mine wrote that *couold* have such a -file
argument,
but which he says "just do `cat config-file`"

As in
foobar `cat config-file-containg-command-line-args`

How do you make that safe?

Oh, yeah, use Perl, do it youyrself.





Q: what would a "securely composable shell scripting language" look like?

Quadibloc

unread,
Dec 9, 2011, 1:55:59 PM12/9/11
to
On Dec 8, 12:09 pm, Robert Wessel <robertwess...@yahoo.com> wrote:

> Quotification is almost always the wrong way to deal with the problem.
> It is, far to commonly, a hack done to address the issue.  It's far
> too easy to leave holes as you describe, or generate false positives
> (a site I know had a filter that looked for things *looking* like
> fragments of SQL - which caused occasional ordinary English text to
> get filtered if it met the right patterns).
>
> Fixing the code to use proper parameters, which are not subject to
> quoting weirdness from maliciously constructed input strings, rather
> than trying to filter the input for malicious SQL, is the correct
> solution.

> There are other solutions as well, but proper parameterization is
> generally the best solution.
>
> For buffer overflows, the correct solution is to fix the program or
> language to not have buffer overflows.

You are absolutely right.

The trouble, of course, is that altogether too many languages and
systems don't have the features that are needed, and so the workaround
of passing a parameter by generating code with that parameter embedded
- which creates the need to "sanitize" input with the unpredictability
involved - is unavoidable.

Old-style mainframe systems, of course, *were* designed so that many
of the security problems we have now simply did not arise. We should
learn from them.

John Savard

Tim McCaffrey

unread,
Dec 9, 2011, 3:19:12 PM12/9/11
to
In article <4EE23C02...@SPAM.comp-arch.net>, an...@SPAM.comp-arch.net
says...

>
>Q: what would a "securely composable shell scripting language" look like?
>

Instead of using quotes, allow parameters to be passed with explicit lengths:

Select where name=<length of name>:<name>

(SQL is not something I use, so just take that as a generic example)

example:

I input "find me" for name (without the quotes).

The script determines the length of "find me" (7) and passes:

Select where name=7:find me

No new character sets, no way to confuse the parser, etc.

(Pascal really did get it right the first time)

- Tim

Terje Mathisen

unread,
Dec 9, 2011, 4:20:14 PM12/9/11
to
Andy "Krazy" Glew wrote:
>
>>>
>>> The general solution to this is "quotification": take the user input,
>>
>> And here is where you go wrong:
>>
>> The general solution is to totally separate parsing from user input,
>> i.e. in your example above you would first parse the SELECT statement,
>> using question marks as placeholders for where you expect input.
>>
>> Later on you execute that prepared (i.e. parsed) statement, substituting
>> the actual user input for the placeholders:
>>
>> I.e. in perl this looks like this:
>>
>> # Let the DB parser see only static strings like this:
>> my $sth =
>> $dbh->prepare("SELECT FIELDLIST FROM TABLE WHERE NAME = '?'");
>>
>> # Get the possibly poisonous user input
>> my $user_input = param('name');
>> $sth->execute($user_input);
>
>
> Sure.
>
> But let me point out
>
> a) SQL parameterization is just a form of quotification. It is a form of
> quotification that has very complicated quote marks, with a level of
> indirection.

I disagree:

The key isn't quotification, rather the opposite: By separating out all
parsing, doing it up front, before ever reading in any user input, you
don't need any quoting at all:

The SQL will simply fail to find any matching tables if you try to
inject any funky code.

I.e. allowing execution of user-supplied data is morally equivalent to
the macro facility in Microsoft Office, something which has been
responsible for a large fraction of all virus/trojan/day-0 attacks.
>
> b) SQL parameterization is only available in SQL. The example you give,
> Terje, is not Perl - it is Perl being used to prepare an SQL statement.

I agree.
>
> How does this help me if I am using some other language, that does not
> have that facility?
>
> SQL injection is not the only form of bad quotification attack.
>
> Yesterday I was writing some scripts - Perl driving Expect driving csh
> driving ... heck, I don't know half of what it was driving, since
> hardware CAD systems are scripts wrapped around scripts wrapped around...
>
> I did something as simple as, in Perl
>
> system("grep $pattern $filename | $user_provided_filter_command")

Wow!
>
> Many opportunities for shell script injection here.
>
> Yes, I am not even sure how the tainting approach would work here. I
> think that $user_provided_shell_command might be restricted to standard
> commands available on the target system, default path, not able to
> invoke arbitrary user code. And constrained not to have shell syntax
> like < or > or ... inside it. (Might want to have capabilities
> attachached to string inlining.)
>
> Yes, perhaps I should be using exec and not system around strings. I
> suppose that exec, which is not subject to shell parsing, is the
> UNIX/shell equivalent

Right.
>
> And, yes, supposedly what I am writing is internal only, never supposed
> to be exposed to anyone untrusted.
>
> But these are the attitudes that get us in trouble. When it is more work
> to be secure than it is to be insecure. When we are willing to write
> insecure code because it is easier, and because we fool ourselves into
> thinking we will never be exposed.
>
> Sure, use SQL paramters.
>
> But more and more I reject any "solution" that says
>
> a) Every programmer should use feature F in language L.
>
> b) Ever programmer should be careful.
>
> I look for techniques that apply cross languages, and cross programmers
> of varying levels of skill.

Very worthwhile goal!

I really don't think changing the character set used for programming has
a snowball in hell's chance of surviving though. :-)

Robert Wessel

unread,
Dec 10, 2011, 1:27:00 AM12/10/11
to
On Fri, 9 Dec 2011 10:55:59 -0800 (PST), Quadibloc <jsa...@ecn.ab.ca>
wrote:

>On Dec 8, 12:09 pm, Robert Wessel <robertwess...@yahoo.com> wrote:
>
>> Quotification is almost always the wrong way to deal with the problem.
>> It is, far to commonly, a hack done to address the issue.  It's far
>> too easy to leave holes as you describe, or generate false positives
>> (a site I know had a filter that looked for things *looking* like
>> fragments of SQL - which caused occasional ordinary English text to
>> get filtered if it met the right patterns).
>>
>> Fixing the code to use proper parameters, which are not subject to
>> quoting weirdness from maliciously constructed input strings, rather
>> than trying to filter the input for malicious SQL, is the correct
>> solution.
>
>> There are other solutions as well, but proper parameterization is
>> generally the best solution.
>>
>> For buffer overflows, the correct solution is to fix the program or
>> language to not have buffer overflows.
>
>You are absolutely right.
>
>The trouble, of course, is that altogether too many languages and
>systems don't have the features that are needed, and so the workaround
>of passing a parameter by generating code with that parameter embedded
>- which creates the need to "sanitize" input with the unpredictability
>involved - is unavoidable.


Most unfortunately true. And many vastly useful bit-and-pieces don't
offer ways to easily use them safely in the presence of potentially
malicious input. Andy's example of tryping to run a command line grep
to do a search from user input is an execlent illustration.


>Old-style mainframe systems, of course, *were* designed so that many
>of the security problems we have now simply did not arise. We should
>learn from them.


To an extent, some of that is luck. Many mainframe applications never
had to deal with unbounded inputs (fixed length records in and out
being the rule), thus you would naturally never do anything like using
gets() to overflow a buffer. And some languages, like Cobol, do make
it a bit harder, but at considerable cost - until more recent times,
there was no dynamic allocation at all in Cobol.

That and the rather lesser attack surface - far fewer malicious users
had access - helped as well.

But if you write a program using dynamic SQL in Cobol, without proper
parameterization, it's just as easy to leave a possible SQL injection
vulnerability as it is on smaller systems.

Nor are buffer overflows impossible, just usually a bit harder, at
least when using the traditional languages. Production Cobol programs
usually don't have subscript range checking turned on, and overflowing
a table (array) is eminently possible. OTOH, Cobol (especially older
Cobol) typically doesn't use a call stack like C does, so overflowing
a table is unlikely to clobber a return address.

In other areas certain mainframe environments have had worse
exposures. For example, CICS, the transaction monitor, runs all of
the transactions in a single address space. In the early days, one
rogue transaction could overwrite any storage in the address space,
including that of any other transaction. As you might expect that was
often a fun debug, often manifesting an error very much later than the
actual bug - with luck, you get a "storage violation" (basically a
heap error), rather than just a random error (or silent failure) in
another transaction. More recent versions of CICS provide
considerable better isolation between transactions.

While mainframe applications as a group are undoubtedly more resistant
to that sort of error than applications on more common platforms, only
a part of that is inherent to the platform, other factors contribute
as well, not least lesser access, a higher institutional regard for
testing and Q/A in general, and a rather slower rate of change.

But yes, there are lessons to be learned, but both what to do, and
what not to do.
0 new messages