This is a proposal to add basic Perl/Ruby like tainting support to
PHP: an option that is turned off by default, and that programmers
may turn on at runtime to alert them when they make the common
mistake of using uncleansed input with include, echo, system, open,
etc. This would work with unmodified third-party extensions.
Taint support is first of all an education tool. When an alert is
raised that data needs cleansing, the programmer needs to make a
conscious decision. It's their job to choose the right method. I'll
discuss below why I think PHP shouldn't make the decision for them.
Taint support is not a sandbox; a malicious PHP script can still
open a pipe to a shell process and feed uncleansed commands to it.
Taint support can be an ingredient to build a sandbox, but that
involves lots more. See for example the Ruby reference at the end.
Of course when overhead is low enough, people might want to turn
on taint checks in production, to implement a multi-layer defense.
Wise people know that no single layer provides perfect protection.
People already do this with other scripting languages.
Last month I did a proof of concept implementation to raise an alert
when raw network data is passed to a few PHP primitives (echo/print,
system/exec, eval/include, open stream, and manipulate file/directory).
If this proposal survives review, then I could spend a chunk of
time in 2007 working out more details and doing a full production
quality implementation.
Before I discuss my taint proposal details I first want to explain
why I see problems with other approaches.
Why not have PHP automatically detect and cleanse malicious data?
=================================================================
In preparation for this I studied several years worth of literature
about making PHP etc. applications "secure" (see references at the
end). I found that many researchers have worked on systems that
try to secure PHP etc. applications without changing PHP script
source code.
These systems explicitly permit the use of uncleansed data in
html/shell/sql commands. When an html/shell/sql request contains
"forbidden" characters in certain positions, these systems either
"neuter" the request before execution, or they abort the request.
In the normal case where requests contain benign inputs only, no-one
will ever notice that these protections exist, except perhaps by
their lack of responsiveness.
I see many problems with systems that do automatic data cleansing.
The main problems are not technical but psychological:
- Education: automatic cleansing systems don't make programmers
aware that network data is inherently untrustworthy. Instead,
they teach the exact opposite: don't worry about data hygiene.
This of course means they will get bitten elsewhere anyway.
- Expectation: automatic cleansing systems have to be perfect. If
the safety net catches some but not all cross-site scripting or
SQL injection attacks, then the system has a security hole and
people lose confidence. This gives security a bad reputation.
These two problems are inherent to all solutions that automatically
fix security problems for the programmer. They encourage programmers
to write sloppy code. I want to help them write better code instead.
But wait, there is more :-) There are technical problems, too:
- Overhead: as strings are sliced, diced, and tossed around, the
automatic cleansing safety net has to keep track of exactly which
characters in a substring are derived from untrusted input, and
which characters are not, so that the safety net can later recognize
malicious content in the middle of html/shell/sql/etc. commands.
- More overhead: special-purpose code is needed in all functions
and all primitives that execute html/shell/sql/etc. commands.
This code is needed because each context has a different definition
of what is "malicious" content in the middle of a request.
- Collateral damage: providers of PHP extensions need to implement
their own special-purpose code to detect or neuter untrusted
substrings in inputs, or to mark untrusted substrings in return
values, otherwise the safety net is incomplete, and the PHP
extension introduces a security hole.
Compared to this, the run-time overhead of maintaining and testing
taint bits in PHP is miniscule, if my experiences with the prototype
are meaningful.
A first look at PHP taint support
=================================
The general idea is to mark certain external inputs as tainted (ex:
network, file), and to disallow the use of tainted data with certain
operations that change PHP's own state (ex: include, eval), or that
access or modify external state (ex: create/open/remove file; connect
to server; generate HTML; execute shell command; execute database
command).
The exact definitions of what inputs are tainted and what operations
are sensitive will need to be made more precise later (lots of
opportunity for splitting fine hairs). For now I want to focus on
the general mechanism.
The following is a high-level view of what would happen when taint
checking is turned on at run-time:
- Each ZVAL is marked tainted or not tainted (i.e. we don't taint
individual characters within substrings). Black and white is all.
In some future, someone may want to explore the possibility of
more than two shades. But not now.
- Primitives and functions such as echo, eval, or mysql_query are
not allowed to receive tainted input. When this happens the script
terminates with a run-time error. It is a bad idea for software
to continue after a security violation.
- PHP propagates taintedness across expressions. If an input to
an expression is tainted, then the result of that expression is
tainted too. There are exceptions to this rule: these are called
sanitisers, as discussed next.
- The PHP application programmer untaints data by explicit assignment
with an untainted value. For example, the result from htmlentities()
or mysql_real_escape_string() is not tainted. People could apply
the wrong sanitizer if they really want to. Remember, the purpose
is to help programmers by telling what data needs cleansing. It
is up to them to make the right decision. If we wanted to force
the use of the "right" sanitizer then we would need multiple
shades of untaintedness. This would not be practical.
That's the high-level view. At a lower level we make trade-offs
between usability, implementation cost, maintenance cost, run-time
cost, and more. In particular, my goal is to support taint with
third-party extensions, but without changes to the source code of
those extensions.
Thus, compromises need to be found, and some perfection needs to
be sacrificed. I'm aiming for "useful" instead of "impossible".
Categorizing the functions and primitives
=========================================
Before we get into the nitty-gritty of tainting and untainting we
first need to categorize functions and primitives according to their
inputs and results.
- Some functions or primitives are not allowed to receive tainted
input; the script is terminated instead. These functions or
primitives are called sensitive.
- Some functions or primitives always return an untainted result.
These functions or primitives are called sanitisers.
- Some functions or primitives return a tainted result only when
their input is tainted. For example, if $x is tainted, then the
return value of substr($x, whatever) is tainted too. I'm still
looking for good name for this function or primitive category
("permeable"? The term should appeal to non-English speakers,
including myself).
- Some functions or primitives always return tainted results. For
example, results from mysql_query must be sanitized before use
in a sensitive context such as echo or print, or in an SQL query(!);
this prevents "stored cross-site scripting" and SQL injection
problems. I'm still looking for good name for this function or
primitive category (tainter?).
mysql_query is an example of a function that is at the same time
sensitive and a tainter: it can receive only untainted input, and
its results are always tainted.
If we want to support taint checking with third-party PHP extensions,
but without changes to their source code, then all we need at this
point is a table that says which functions are sensitive and/or
tainters, and so on. If no information is available about a specific
function we could assume the worst: when taint checking is turned
on at runtime, an uncategorized function would not be allowed to
receive tainted inputs, and its results would always be tainted.
All this categorization of functions makes no difference to how
applications run, until someone actually turns on taint checking
at run-time. It is therefore completely backwards compatible, even
when the function category information is incorrect or incomplete.
Setting taint
=============
Obviously, data from the network and from databases needs to be
marked as tainted. Less obvious is if we want to go as far as Perl,
and also taint the current directory and a bunch of other things.
More study is needed. Although these decisions will greatly impact
usability, they can be made later, because they don't really affect
the over-all design or implementation of the system.
Propagating taint
=================
At this level things start to become interesting. Within the PHP
core we have a large amount of control over how taint is handled,
so we can spend a lot of time splitting fine hairs. With third-party
extensions we are very much limited by the requirement that taint
checking must not require source code changes to those extensions.
- Within the PHP core, taint propagation can be fine-grained. For
example, we could decide that substr(string, start, length) returns
a tainted result only if the string argument is tainted; a tainted
start or length argument would not affect the taintedness of the
result. For comparison, Perl even taints the result when the start
argument is tainted; with Ruby, numbers can't be tainted, so the
issue with a tainted start or length values does not even exist.
This is just one example of splitting fine hairs.
- With PHP extensions, taint detection and propagation can happen
only at the function interface (the functions themselves aren't
taint-aware, unless their implementor went through the effort).
If an extension function is allowed to receive tainted input,
then its return value is also tainted unless the function is
classified as a sanitizer. With extensions that receive complex
arguments such as arrays, there is an increased cost as the PHP
core inspects each sub-element for taintedness before making the
actual function call. A similar cost exists with extensions that
return complex tainted results, as the PHP core needs to mark
each sub-element as tainted. These issues may disappear over time
as implementors adopt taint support into their extensions.
Removing taint
==============
This last section gives just a few examples; there are a lot of
detail issues that will need to be considered before we have a
complete system.
- Removing taintedness requires an explicit assignment. For example,
assigning the result from a sanitizer produces an untainted result.
- Some implementations (Perl) propagate taint across, for example,
string<->number conversions; other implementations consider the
result not tainted (Ruby). More fine hair splitting.
- Testing a variable does not untaint it. For example, if $x is
tainted, the following does not change its taint status:
if (some test involving $x) { $x is still tainted here }
The reason for this is entirely practical: we can't reliably
determine if the intention of the test is to sanitize input.
- Testing a tainted variable does not taint the result of expressions
that follow the test. For example, if $y is tainted, the following
code does not taint $x:
if ($y == 0) { $x = 0; } else { $x = 1; }
There is one well-known example where such a strategy breaks down,
and that is the degenerate case where a long chain of if/else if/...
statements is used instead of a lookup table:
if ($y == 0) { $x = 0; } else if ($y == 1) { $x = 1; } else ...
Remember that my purpose is to help the programmer; my purpose
is not fighting programmers who insist on writing bad code. After
all, they can always open a pipe to a shell process and execute
uncleansed commands there.
This is just the beginning of a long list of things that need to
be looked into, but I will stop here for now.
Conclusion
==========
This is a proposal to add run-time taint checking to PHP. It is not
a sandbox for the execution of hostile code. It is just a tool to
help programmers find places where they need to sanitize data. It
avoids changes to third-party extensions, and is turned off by
default. It is therefore completely backwards compatible.
Last month I did a proof of concept implementation that protects a
few PHP primitives. If this proposal survives review, then I could
spend a chunk of time in 2007 working out the details doing a full
production quality implementation.
References
==========
The Perl taint feature has been an example for many other efforts.
http://perldoc.perl.org/perlsec.html
Ruby implements multiple levels of taint checking. The lowest level
is like Perl. The highest level is a sandbox where code can neither
create nor modify untainted objects, read/write files or sockets,
etc. At this level, my claim that "the programmer can always open
a pipe to a shell and execute uncleansed commands there" is no
longer true. http://www.rubycentral.com/book/taint.html
Wei Xu, Sandeep Bhatkar, R. Sekar: "Taint-Enhanced Policy Enforcement:
A Practical Approach to Defeat a Wide Range of Attacks". 15th USENIX
Security Symposium Vancouver, BC, Canada, August 2006.
http://seclab.cs.sunysb.edu/sandeep/pubs/papers/taint_usenixsec06.pdf
Source-to-source transformation to instrument source code with
taint tracking. They use a rule-based policy to disallow tainted
metacharacters shell/sql commands, format strings, html tags etc.
Modest overhead.
Alex Ho, Michael Fetterman, Christopher Clark, Andrew Warfield, and
Steven Hand: "Practical TaintBased Protection using Demand Emulation".
Eurosys2006, Leuven, Belgium, April 2006.
http://www.cl.cam.ac.uk/~akw27/papers/taint-eurosys06.pdf
http://www.cs.kuleuven.ac.be/conference/EuroSys2006/papers/p29-ho.pdf
This work uses Xen virtualisation for code that handles untainted
data, and switches to Qemu emulation for code that touches
tainted data. Unlike anyone else they also taint individual
disk blocks, as the process writes tainted data to file.
J. Newsome and D. Song: "Dynamic Taint Analysis: Automatic Detection,
Analysis, and Signature Generation of Exploit Attacks on Commodity
Software". Network and Distributed Systems Security Symposium,
February 2005. http://www.ece.cmu.edu/~dawnsong/papers/taintcheck.pdf
Binary-to-binary translation with valgrind. Significant overhead.
A. Nguyen-Tuong, S. Guarnieri, D. Greene, J. Shirley, and D. Evans.
"Automatically hardening web applications using precise tainting."
20th IFIP International Information Security Conference, 2005.
http://www.cs.virginia.edu/evans/pubs/infosec05.pdf
Precise taint tracking using a modified PHP engine. Tainted
data is forbidden (ex: SQL queries, system functions) or sanitized
(ex: HTML output). Data from SQL database is considered tainted.
Careful parsing of SQL, HTML to prevent command injection and
cross-site scripting via tainted operators or tags. Low overhead.
Tadeusz Pietraszek, Chris Vanden Berghe: "Defending Against Injection
Attacks Through Context-Sensitive String Evaluation". Recent
Advances in Intrusion Detection (RAID), 2005.
http://chris.vandenberghe.org/publications/csse_raid2005.pdf
Precise taint tracking using a modified PHP engine. Taint-awareness
requires modifications to built-ins and to extensions. Careful
parsing of SQL etc. to prevent command injection. Modest overhead.
Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, D. T. Lee,
Sy-Yen Kuo: "Securing Web Application Code by Static Analysis and
Runtime Protection". Proceedings of the 13th international conference
on World Wide Web (May 2004).
http://www2004.org/proceedings/docs/1p40.pdf
Hybrid system: a static source analyzer identifies code that
may be vulnerable, then inserts sanitizer code that appears to
be missing. Very low overhead.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
> This is a proposal to add basic Perl/Ruby like tainting support to
> PHP: an option that is turned off by default, and that programmers
> may turn on at runtime to alert them when they make the common
> mistake of using uncleansed input with include, echo, system, open,
> etc. This would work with unmodified third-party extensions.
I doubt it is plausible to make it work entirely without touching
external extensions that those extensions may be changing behavior of
data from tainted to un-tainted and vice versa.
> Taint support is not a sandbox; a malicious PHP script can still
> open a pipe to a shell process and feed uncleansed commands to it.
> Taint support can be an ingredient to build a sandbox, but that
> involves lots more. See for example the Ruby reference at the end.
Sounds awefuly like yet another safe_mode, something that proclaims
security, yet being unable to provide it.
> Of course when overhead is low enough, people might want to turn
> on taint checks in production, to implement a multi-layer defense.
> Wise people know that no single layer provides perfect protection.
> People already do this with other scripting languages.
Unlikely to ever be the case, the overhead of taint modes is
generally quite significant.
> - Education: automatic cleansing systems don't make programmers
> aware that network data is inherently untrustworthy. Instead,
> they teach the exact opposite: don't worry about data hygiene.
> This of course means they will get bitten elsewhere anyway.
Most people program not to learn how, but to solve problems. Which is
why automatic filtering has been the holy grail of security as it
allows developers to avoid thinking about input validation beyond the
initial setup and move on with their lives.
> - Expectation: automatic cleansing systems have to be perfect. If
> the safety net catches some but not all cross-site scripting or
> SQL injection attacks, then the system has a security hole and
> people lose confidence. This gives security a bad reputation.
Same argument can be made about taint mode, judging by Perl and Ruby
where there are tricks to bypass it, same argument applies.
> - Overhead: as strings are sliced, diced, and tossed around, the
> automatic cleansing safety net has to keep track of exactly which
> characters in a substring are derived from untrusted input, and
> which characters are not, so that the safety net can later recognize
> malicious content in the middle of html/shell/sql/etc. commands.
If you look at filter, there is no tracking of malicious chars, the
data is simple cleansed of them or rejected all together, this is a
one time event.
> - More overhead: special-purpose code is needed in all functions
> and all primitives that execute html/shell/sql/etc. commands.
> This code is needed because each context has a different definition
> of what is "malicious" content in the middle of a request.
That's why you can use RAW mode and filter the data when necessary.
> Compared to this, the run-time overhead of maintaining and testing
> taint bits in PHP is miniscule, if my experiences with the prototype
> are meaningful.
I am highly skeptical regarding this claim.
> - Each ZVAL is marked tainted or not tainted (i.e. we don't taint
> individual characters within substrings). Black and white is all.
> In some future, someone may want to explore the possibility of
> more than two shades. But not now.
That means an additional element to a struct that has thousands of
instances in most scripts, this will be the first overhead caused by
the memory footprint increase.
> - Primitives and functions such as echo, eval, or mysql_query are
> not allowed to receive tainted input. When this happens the script
> terminates with a run-time error. It is a bad idea for software
> to continue after a security violation.
You would need to go through some 5,000+ functions that PHP offers
and determine which one can and cannot receive tainted data,
something that virtually guarantees things will be missed, bring us
back to the safe_mode/open_basedir problem.
> - PHP propagates taintedness across expressions. If an input to
> an expression is tainted, then the result of that expression is
> tainted too. There are exceptions to this rule: these are called
> sanitisers, as discussed next.
That goes counter to your original point that extensions do not need
to be taint aware, what you propose would require adjustment of
nearly every single extension. The additional tainted, not-tainted
checks will add further overhead.
> - The PHP application programmer untaints data by explicit assignment
> with an untainted value. For example, the result from
> htmlentities()
> or mysql_real_escape_string() is not tainted. People could apply
> the wrong sanitizer if they really want to. Remember, the purpose
> is to help programmers by telling what data needs cleansing. It
> is up to them to make the right decision. If we wanted to force
> the use of the "right" sanitizer then we would need multiple
> shades of untaintedness. This would not be practical.
Again, many functions have different behaviors etc... Let's take an
example htmlspecialchars() is great against XSS but does nothing for
exec(), so if you htmlspecialchars a string then pass it to exec, it
thinks that the data is non-tainted and executes it resulting in
command injection.
Overall, as it stands I do not believe that this is a good idea and
as is my vote would be -0.5 on its inclusion into PHP.
Ilia Alshanetsky
Repeating my comments on that, I think that it can be done not like
safe_mode, if we take different approach. Namely, not "mark unsafe,
accept otherwise" but "mark safe, deny otherwise". Meaning, first we
separate functions into 3 categories:
1. Cleaners - can take tainted data, produce untainted data
2. Regulars - can take tainted data, produce tainted data
3. Protected - can take only untainted data
Then we mark all "unknown" function protected, then - the hard part, lot
of work alert - go over code base and extensions and put "regular" flags
on some basic functions (like string manipulation) and "cleaner" flag on
some like filters. This can be done on engine level - meaning, except
some special cases, most function would never need any code change but a
bit in function table description. My opinion is that besides cleaners,
most functions actually would be fine never receiving tainted data
without prior cleaning.
Of course, it would break a lot of code, but it would be also a powerful
way to drive people to write more secure code. E.g. if your app passes
taint-check before deployment, it means most probably you didn't make
the top 80% of PHP security mistakes. Of course, it won't be bulletproof
- but no security enhancement is, and ensuring 100% secure code with
automatic tools is quite a dream anyway.
> Unlikely to ever be the case, the overhead of taint modes is generally
> quite significant.
This is important consideration. However, if one works with it as
pre-deployment test step and not production, it might be less important
- especially if combined with some kind of unit-testing and coverage
solution.
This of course can be - and should be - combined with runtime solution
like filtering. It is my impression that some view tainting as something
opposed to filtering, while the correct approach would be that tainting
complements filtering, ensuring that no data route left uncovered.
> You would need to go through some 5,000+ functions that PHP offers and
> determine which one can and cannot receive tainted data, something that
> virtually guarantees things will be missed, bring us back to the
> safe_mode/open_basedir problem.
That's why - see above - I would recommend reverse approach.
> Again, many functions have different behaviors etc... Let's take an
> example htmlspecialchars() is great against XSS but does nothing for
> exec(), so if you htmlspecialchars a string then pass it to exec, it
> thinks that the data is non-tainted and executes it resulting in command
> injection.
Yep, this is a problem, thus I'm not sure htmlspecialchars should be
always given cleaner attribute. There are a number of ways to deal with it:
1. Let the user be smart - if he thought using htmlspecialchars he
probably is aware of filtering, so we hope he can do it right. Most of
problems are due to lack of any filtering at all because nobody even
thought of doing it.
2. Let the user to say us the function is really cleaning - e.g. add
some parameter that would say "untaint" - suppsing that if one bothered
to put this parameter one probably gave some thought to it.
3. Let the user use specific cleaner functions - like filters - so that
we reasonably sure he knows what he's doing.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
>> Sounds awefuly like yet another safe_mode, something that
>> proclaims security, yet being unable to provide it.
>
> Repeating my comments on that, I think that it can be done not like
> safe_mode, if we take different approach. Namely, not "mark unsafe,
> accept otherwise" but "mark safe, deny otherwise".
Ok this is better, but it will break every single application out
there. I for one think that this is unacceptable.
> Meaning, first we separate functions into 3 categories:
> 1. Cleaners - can take tainted data, produce untainted data
> 2. Regulars - can take tainted data, produce tainted data
> 3. Protected - can take only untainted data
> Then we mark all "unknown" function protected, then - the hard
> part, lot of work alert - go over code base and extensions and put
> "regular" flags on some basic functions (like string manipulation)
> and "cleaner" flag on some like filters. This can be done on engine
> level - meaning, except some special cases, most function would
> never need any code change but a bit in function table description.
> My opinion is that besides cleaners, most functions actually would
> be fine never receiving tainted data without prior cleaning.
>
> Of course, it would break a lot of code, but it would be also a
> powerful way to drive people to write more secure code. E.g. if
> your app passes taint-check before deployment, it means most
> probably you didn't make the top 80% of PHP security mistakes. Of
> course, it won't be bulletproof - but no security enhancement is,
> and ensuring 100% secure code with automatic tools is quite a dream
> anyway.
When it comes to security things like "most probably" and "likely"
usually are synonymous with insecure. The biggest issue with
safe_mode (and I suspect tainting as well) is the faulty assumption
that if you enable it, you get instant security and you can quite
being paranoid. Which leads to false sense of security and eventually
getting hacked.
>> You would need to go through some 5,000+ functions that PHP offers
>> and determine which one can and cannot receive tainted data,
>> something that virtually guarantees things will be missed, bring
>> us back to the safe_mode/open_basedir problem.
>
> That's why - see above - I would recommend reverse approach.
So you reverse the problem, you'd still need to examine every
function and determine which of its parameters can be tainted and
which cannot.
>
>> Again, many functions have different behaviors etc... Let's take
>> an example htmlspecialchars() is great against XSS but does
>> nothing for exec(), so if you htmlspecialchars a string then pass
>> it to exec, it thinks that the data is non-tainted and executes it
>> resulting in command injection.
>
> Yep, this is a problem, thus I'm not sure htmlspecialchars should
> be always given cleaner attribute. There are a number of ways to
> deal with it:
> 1. Let the user be smart - if he thought using htmlspecialchars he
> probably is aware of filtering, so we hope he can do it right. Most
> of problems are due to lack of any filtering at all because nobody
> even thought of doing it.
> 2. Let the user to say us the function is really cleaning - e.g.
> add some parameter that would say "untaint" - suppsing that if one
> bothered to put this parameter one probably gave some thought to it.
> 3. Let the user use specific cleaner functions - like filters - so
> that we reasonably sure he knows what he's doing.
I think we should ensure that we can provide users with reliable
means to validate data via things like filter and database escaping
functions and let people decide what and where things should be used.
Doing automated validation is prone to error and we are almost
guaranteed to never get it right.
Ilia Alshanetsky
There is no need for that.
zvalue_value value; /* value */
zend_uint refcount;
zend_uchar type; /* active type */
! zend_uchar is_ref;
};
--- 289,296 ----
zvalue_value value; /* value */
zend_uint refcount;
zend_uchar type; /* active type */
! zend_uchar is_ref:7;
! zend_uchar taint_flag:1;
};
Perhaps a working implementation will be convincing. I offer to
do the work, other people lose nothing except the possibility that
they will be proven right.
Wietse
Remember, taint checks are turned off off by default. Nothing
breaks.
As for precision, we can have a fail-close system with the default
"no function/primitive accepts tainted data" policy.
Over time we can "open up" functions/primitives, once the framework
is in place. After that, taint support can be extended inrementally.
Even if some taint check is to restrictive at some point in time,
the programmer can always overcome it with an explicit action.
Well, initially - yes. With some tweaking like using automatic filters
taint-safeguarding should not be too hard though (this would be one of
the milestones - if we can't make it easy, the whole idea is doomed).
And yes, it definitely makes it unusable as default production mode -
but that's not the intended mode - not at least until people would make
writing taint-safe code the routine :)
> usually are synonymous with insecure. The biggest issue with safe_mode
> (and I suspect tainting as well) is the faulty assumption that if you
> enable it, you get instant security and you can quite being paranoid.
> Which leads to false sense of security and eventually getting hacked.
I think the issue with safe mode is the "mark unsafe" approach - which,
when applied to very rich and diverse environment such as PHP is bound
to fail to provide comprehensive security, thus leading to the problems
you describe.
> So you reverse the problem, you'd still need to examine every function
> and determine which of its parameters can be tainted and which cannot.
Not necessarily. If I have some obscure libWhateverFoo extension, I can
just say "OK, I don't know what this does, but please filter/untaint
data before feeding them to it" and for that I don't even need to touch
the extension itself. This would not be 100% since the user may just not
know that passing string "pleasekillme" to this extension would blow up
his data center, but at least we would make user consider the fact that
he passes external data to the extension and make decision. :)
Also, if you talk about standard functions like strlen, demanding to
untaint all data before calling strlen would be rather harsh, so yes, we
would have to go over many functions. It is definitely not a
half-an-hour work, and it definitely would require some deep
consideration and discussion, but if we always prefer to err on the safe
side - and remember, we would not need any effort to be on the safe side
- I think we could make it work reasonable well.
> I think we should ensure that we can provide users with reliable means
> to validate data via things like filter and database escaping functions
I agree.
> and let people decide what and where things should be used. Doing
> automated validation is prone to error and we are almost guaranteed to
> never get it right.
No, I do not propose to do automatic validation. I only present a choice
when we would ask the user to decide if the data is validated enough to
be untainted - we can do it as explicit as requiring specific filter
function call, or as implicit as making many filtering functions such as
htmlspecialchars functions as cleaners. I tend to be somewhere in
between, but that's exactly the place where more feedback would be very
useful.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
> Ilia Alshanetsky:
>>> - Each ZVAL is marked tainted or not tainted (i.e. we don't taint
>>> individual characters within substrings). Black and white is all.
>>> In some future, someone may want to explore the possibility of
>>> more than two shades. But not now.
>>
>> That means an additional element to a struct that has thousands of
>> instances in most scripts, this will be the first overhead caused by
>> the memory footprint increase.
>
> There is no need for that.
>
> zvalue_value value; /* value */
> zend_uint refcount;
> zend_uchar type; /* active type */
> ! zend_uchar is_ref;
> };
>
>
> --- 289,296 ----
> zvalue_value value; /* value */
> zend_uint refcount;
> zend_uchar type; /* active type */
> ! zend_uchar is_ref:7;
> ! zend_uchar taint_flag:1;
> };
>
> Perhaps a working implementation will be convincing. I offer to
> do the work, other people lose nothing except the possibility that
> they will be proven right.
By all means :-)
I suppose by making taint_flag 1 bit, you are assuming taint is
purely black and white and that all untaint functions will secure
data against all forms of usage.
Ilia Alshanetsky
>> Ok this is better, but it will break every single application out
>> there. I for one think that this is unacceptable.
>
> Well, initially - yes. With some tweaking like using automatic
> filters taint-safeguarding should not be too hard though (this
> would be one of the milestones - if we can't make it easy, the
> whole idea is doomed). And yes, it definitely makes it unusable as
> default production mode - but that's not the intended mode - not at
> least until people would make writing taint-safe code the routine :)
All it means is extra work for developers with little or no tangible
benefits. I also wonder how taint will work with the standard remove/
add slashes wrapper most large apps implement now a days that
effectively modifies every input variable going into the application.
>> usually are synonymous with insecure. The biggest issue with
>> safe_mode (and I suspect tainting as well) is the faulty
>> assumption that if you enable it, you get instant security and you
>> can quite being paranoid. Which leads to false sense of security
>> and eventually getting hacked.
>
> I think the issue with safe mode is the "mark unsafe" approach -
> which, when applied to very rich and diverse environment such as
> PHP is bound to fail to provide comprehensive security, thus
> leading to the problems you describe.
The job of a language is to provide tools, not arbitrary crippling
limitation under the guise of security improvement.
>> So you reverse the problem, you'd still need to examine every
>> function and determine which of its parameters can be tainted and
>> which cannot.
>
> Not necessarily. If I have some obscure libWhateverFoo extension, I
> can just say "OK, I don't know what this does, but please filter/
> untaint data before feeding them to it" and for that I don't even
> need to touch the extension itself. This would not be 100% since
> the user may just not know that passing string "pleasekillme" to
> this extension would blow up his data center, but at least we would
> make user consider the fact that he passes external data to the
> extension and make decision. :)
So that means I now need to do pointless call even in instances where
untainted data would be perfectly safe, nice.
> Also, if you talk about standard functions like strlen, demanding
> to untaint all data before calling strlen would be rather harsh, so
> yes, we would have to go over many functions. It is definitely not
> a half-an-hour work, and it definitely would require some deep
> consideration and discussion, but if we always prefer to err on the
> safe side - and remember, we would not need any effort to be on the
> safe side - I think we could make it work reasonable well.
safe_mode sounded like a really reasonable idea too, I would've hoped
some lessons from past mistakes could be made.
>> and let people decide what and where things should be used. Doing
>> automated validation is prone to error and we are almost
>> guaranteed to never get it right.
>
> No, I do not propose to do automatic validation. I only present a
> choice when we would ask the user to decide if the data is
> validated enough to be untainted - we can do it as explicit as
> requiring specific filter function call, or as implicit as making
> many filtering functions such as htmlspecialchars functions as
> cleaners. I tend to be somewhere in between, but that's exactly the
> place where more feedback would be very useful.
If choice is equivalent to forcing the user to untainted all data,
then I suppose you're right.
Yes.
I think that a taint check would be a great help with php.
I have seen many php scripts from many people, I am always shocked at the
way in which values from forms are frequently trusted without checks.
Php is easy to write, that is good. Unfortunately this also means that
bad/simple/careless programmers can use php ... these are the ones who
cause many of the php script errors that cause problems.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
> Ilia Alshanetsky:
>>
>> On 15-Dec-06, at 4:16 PM, Stanislav Malyshev wrote:
>>
>>>> Sounds awefuly like yet another safe_mode, something that
>>>> proclaims security, yet being unable to provide it.
>>>
>>> Repeating my comments on that, I think that it can be done not like
>>> safe_mode, if we take different approach. Namely, not "mark unsafe,
>>> accept otherwise" but "mark safe, deny otherwise".
>>
>> Ok this is better, but it will break every single application out
>> there. I for one think that this is unacceptable.
>
> Remember, taint checks are turned off off by default. Nothing
> breaks.
In theory, you need to consider that many ISPs and users will
interpret taint mode == secure and enable it causing much grief to
distributable application writers who need to accommodate every
environment.
Ilia Alshanetsky
If the default fail-close security policy is no function receives
tainted input unless explicitly stated otherwise, then one bit
suffices. That is actually the easiest part.
We also need a default policy for function outputs. Some functions
read external data; that needs to be escaped before it can be used
in echo/print/etc. So the fail-close security policy would be all
result values are tainted unless explicitly stated otherwise.
There's more work than this, but it gives the general idea.
Wietse
> Php is easy to write, that is good. Unfortunately this also means that
> bad/simple/careless programmers can use php ... these are the ones who
> cause many of the php script errors that cause problems.
That's exactly it, PHP is popular because it is easy to use. You take
that away and quite frankly there is very little reason to left to
use PHP over C#, Python, etc...
Ilia Alshanetsky
Security is benefit. Of course, the developers that are sure they write
secure code anyway need not be bothered by tainting and can leave it off
forever.
> The job of a language is to provide tools, not arbitrary crippling
> limitation under the guise of security improvement.
I agree. Tainting is one of such tools, aimed at improving security.
> safe_mode sounded like a really reasonable idea too, I would've hoped
> some lessons from past mistakes could be made.
I do not see what exactly you propose to learn from safe mode mistakes -
that we should never try to improve PHP security by providing language
level tools? I do not see how this can be derived from whatever was
wrong with safe mode. It may be that the tainting would not catch but I
do not think safe mode problems should prevent us from even trying.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
> Ilia Alshanetsky:
>> On 15-Dec-06, at 5:19 PM, Wietse Venema wrote:
>>> Ilia Alshanetsky:
>>>> That means an additional element to a struct that has thousands of
>>>> instances in most scripts, this will be the first overhead
>>>> caused by
>>>> the memory footprint increase.
>>>
>>> There is no need for that.
>>>
>>> < zend_uchar is_ref;
>>> ---
>>>> zend_uchar is_ref:7;
>>>> zend_uchar taint_flag:1;
>>>
>>> Perhaps a working implementation will be convincing. I offer to
>>> do the work, other people lose nothing except the possibility that
>>> they will be proven right.
>>
>> By all means :-)
>>
>> I suppose by making taint_flag 1 bit, you are assuming taint is
>> purely black and white and that all untaint functions will secure
>> data against all forms of usage.
>
> If the default fail-close security policy is no function receives
> tainted input unless explicitly stated otherwise, then one bit
> suffices. That is actually the easiest part.
And here is your first exploit, let's say we say
mysql_real_escape_string() takes tainted data and makes it untainted,
what happens when this "safe" data is passed to exec(). You are going
to need to deal with different levels of taint-untainted and 1 bit is
not going to give you that flexibility. You are going to need an int/
long, maybe even a long long.
Ilia Alshanetsky
>> All it means is extra work for developers with little or no
>> tangible benefits. I also wonder how taint will work with the
>> standard remove/add
>
> Security is benefit. Of course, the developers that are sure they
> write secure code anyway need not be bothered by tainting and can
> leave it off forever.
So you claim that without taint mode it is not possible to write safe
PHP code?
>
>> The job of a language is to provide tools, not arbitrary crippling
>> limitation under the guise of security improvement.
>
> I agree. Tainting is one of such tools, aimed at improving security.
Tainting is a false security it makes you feel secure, when you
aren't. First its off in production and that's where all the hacks
appear, it will have holes due to unforeseen function usage, dynamic
variables, false untainting etc...
>> safe_mode sounded like a really reasonable idea too, I would've
>> hoped some lessons from past mistakes could be made.
>
> I do not see what exactly you propose to learn from safe mode
> mistakes - that we should never try to improve PHP security by
> providing language level tools? I do not see how this can be
> derived from whatever was wrong with safe mode. It may be that the
> tainting would not catch but I do not think safe mode problems
> should prevent us from even trying.
Good luck, I suppose on a base level it is entertaining seeing
someone bang their head against the wall time and time again.
I'd say you have really weird code if you do mysql_real_escape_string()
in order to pass the data to exec() ;)
> need to deal with different levels of taint-untainted and 1 bit is not
> going to give you that flexibility. You are going to need an int/long,
> maybe even a long long.
What would be stored in this long long?
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
>> That's exactly it, PHP is popular because it is easy to use. You
>> take that away and quite frankly there is very little reason to
>> left to use PHP over C#, Python, etc...
>
> Well, this is old :) Discussing every feature there are people that
> say "until PHP does foo-feature, it can not be considered serious
> language" and "if PHP does foo-feature, it will become Java".
> Usually both are wrong ;) Tainting would not make PHP into C# or
> Java, and I think we would take maximum effort to do it in organic
> way, and that's exactly one of the points that we need to discuss.
> If you feel it can not be done in principle - you are welcome to
> argue why.
I think I've identified a number of reasons why it would not work, if
you know how to solve that's great by all means proceed.
Ilia Alshanetsky
>> And here is your first exploit, let's say we say
>> mysql_real_escape_string() takes tainted data and makes it
>> untainted, what happens when this "safe" data is passed to exec().
>> You are going to
>
> I'd say you have really weird code if you do
> mysql_real_escape_string() in order to pass the data to exec() ;)
I'd say you have pretty weird code if you do include $_POST['VAR'];
and yet people do exactly that.
>> need to deal with different levels of taint-untainted and 1 bit is
>> not going to give you that flexibility. You are going to need an
>> int/long, maybe even a long long.
>
> What would be stored in this long long?
Bitmask identifying different taint modes.
Education is nice, but providing tools that can help you in developing is
even nicer. As far as taining levels are concerned. I don't think we'd have
to go that far. There's a limit on how much you can help a person not to
shoot themselves in their own foot. If they want to explicitly untaint
dangerous data, or validate data for XSS and use it in exec() that's already
the point where the user needs to know what he's doing. We can just provide
the tools which help him and guide him at what data he needs to cleanse. He
can screw up in the cleansing process. Hopefully solutions like ext/filter
will provide the right APIs though to make this very straightforward for the
average developer.
I must say that I am also a bit worried about having another safe_mode saga.
However, if we take an approach here where everything is tainted unless
proven innocent it might succeed much better than safe_mode where we had to
label the "insecure" places.
I think this all boils down to expectations. If people's expectations will
be that if passing tainted mode means the app is secure than that's
obviously a problem. If people's expectations are that this is a tool which
helps them better secure their app then that might be successful.
I've been on the fence here from the beginning. I am pretty sure this would
be a very useful feature but I'm somewhat paranoid after the whole safe_mode
saga. Stas pointing out the difference between safe_mode which is "trust
everyone unless explicitly pointed out" vs. a possible approach here of
"trust noone unless explicitly pointed out" might make the difference
though.
Andi
> -----Original Message-----
> From: Ilia Alshanetsky [mailto:ili...@gmail.com] On Behalf Of
> Ilia Alshanetsky
> Sent: Friday, December 15, 2006 3:08 PM
> To: Alain Williams
> Cc: PHP internals
> Subject: Re: [PHP-DEV] Run-time taint support proposal
>
>
> On 15-Dec-06, at 6:02 PM, Alain Williams wrote:
>
> > Php is easy to write, that is good. Unfortunately this also
> means that
> > bad/simple/careless programmers can use php ... these are
> the ones who
> > cause many of the php script errors that cause problems.
>
> That's exactly it, PHP is popular because it is easy to use.
> You take that away and quite frankly there is very little
> reason to left to use PHP over C#, Python, etc...
>
> -----Original Message-----
> From: Ilia Alshanetsky [mailto:ili...@gmail.com] On Behalf Of
> Ilia Alshanetsky
> Sent: Friday, December 15, 2006 3:12 PM
> To: PHP internals
> Subject: Re: [PHP-DEV] Run-time taint support proposal
>
>
> And here is your first exploit, let's say we say
> mysql_real_escape_string() takes tainted data and makes it
> untainted, what happens when this "safe" data is passed to
> exec(). You are going to need to deal with different levels
> of taint-untainted and 1 bit is not going to give you that
> flexibility. You are going to need an int/ long, maybe even a
> long long.
>
Actually, I said exactly the opposite - if you write secure code, you do
not need it. If you are concerned about your code potentially being
buggy and do not want to rely only on your own smarts to avoid it - you
need security tools. Tainting is one of such tools.
> Tainting is a false security it makes you feel secure, when you aren't.
Well, everything is false security then, because I know of no remotely
accessible system that didn't have one or other way to circumvent the
access control. Programs have bugs, passwords can be stolen or guessed,
etc. So I would propose to move away from generic statements to
something more concrete.
> First its off in production and that's where all the hacks appear, it
> will have holes due to unforeseen function usage, dynamic variables,
> false untainting etc...
You are saying tainting is no silver bullet? I couldn't agree more. But
then again, nothing is :)
> Good luck, I suppose on a base level it is entertaining seeing someone
> bang their head against the wall time and time again.
You could enjoy the entertainment or you could bring some tools and help
bring the wall down :) Whatever you heart desires.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
> Well you wouldn't have to use tainted mode of course. This can/
> should be
> turned off by default.
safe_mode is/was off by default to, and yet a good chunk of hosts
enabled it under the assumption it would make their setups secure.
> I think having such a mode would be of great help to many users
> though. It
> would allow them to find any quirks in their data/input filtering
> and help
> them focus on where they should do a better job.
I think people generally follow the path of least resistance and for
compat purposes I suspect most application will simply create a quick
wrapper to **untaint** all input data so they can use it within their
application without worrying about tainting getting in the way.
> Of course no one such
> solution will solve application-level security issues but it could
> end up
> being a very useful tool which helps people address the majority of
> such
> problems.
Tainting may have helped if you are using straight forward data, but
what happens when you start doing appending or passing data through
string modification operations. I almost guarantee that there will be
a series of operations or function calls that if executed in a
certain order will mask tainted input making it appear safe. I'd
gladly provide you with examples, but there is no sample code, if I
have time I'll take a peak at Perl and Ruby that implement tainting
and see if a quick taint bypass can be devised.
> I've been on the fence here from the beginning. I am pretty sure
> this would
> be a very useful feature but I'm somewhat paranoid after the whole
> safe_mode
> saga. Stas pointing out the difference between safe_mode which is
> "trust
> everyone unless explicitly pointed out" vs. a possible approach
> here of
> "trust noone unless explicitly pointed out" might make the difference
> though.
All it means that this breaks every applications and the security
benefits are somewhat ambiguous, but in all fairness the full
consequences are hard to predict without sample code.
Ilia Alshanetsky
> I think people generally follow the path of least resistance and for
> compat purposes I suspect most application will simply create a quick
> wrapper to **untaint** all input data so they can use it within their
> application without worrying about tainting getting in the way.
More fool them.
> Tainting may have helped if you are using straight forward data, but
> what happens when you start doing appending or passing data through
> string modification operations. I almost guarantee that there will be
> a series of operations or function calls that if executed in a
> certain order will mask tainted input making it appear safe. I'd
> gladly provide you with examples, but there is no sample code, if I
> have time I'll take a peak at Perl and Ruby that implement tainting
> and see if a quick taint bypass can be devised.
OK: so there may be a few cases where it won't work, that does not mean
that there won't be great advantages for the majority of situations.
> All it means that this breaks every applications and the security
> benefits are somewhat ambiguous, but in all fairness the full
> consequences are hard to predict without sample code.
It is OFF by default.
RegisterGlobals was initially ON by detault since loosing it broke
a lot of code. PHP survived that.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
--
>> So you claim that without taint mode it is not possible to write
>> safe PHP code?
>
> Actually, I said exactly the opposite - if you write secure code,
> you do not need it. If you are concerned about your code
> potentially being buggy and do not want to rely only on your own
> smarts to avoid it - you need security tools. Tainting is one of
> such tools.
>
>> Tainting is a false security it makes you feel secure, when you
>> aren't.
>
> Well, everything is false security then, because I know of no
> remotely accessible system that didn't have one or other way to
> circumvent the access control. Programs have bugs, passwords can be
> stolen or guessed, etc. So I would propose to move away from
> generic statements to something more concrete.
Not quite. If you have a function that let's say designed
specifically to take unsafe data and make it safe for a particular
use. For example htmlentities(value, ENT_QUOTES) will make input data
safe to print on screen without concerns about XSS. At the this
limited and very simple level it is very simple to provide user with
convenient and simple means of auditing the data and making it safe.
Tainting as such is not a tool, because it does not secure data, it
just imposes arbitrary limits on what's safe or not, which is only
sometimes right, because you cannot predict every (even every common)
code path. If we take tools like coverity and alike there is a reason
why majority of the thing they find are false positives. The problem
with so many false positives is that they reduce the value of the
tool and make users ignore it or work to simply bypass it.
Consider E_NOTICE, it is a superb tool for finding out things like un-
declared variables (which often cause all manner of exploits), and
yet most developers keep it off because it gets in a way, even though
it has 0 false positives. However I suppose it is simpler and
ultimately harmless to do $value = (int)$_POST['value']; without
checking if $_POST['value'] exists via isset.
Another interesting example I want to bring to your attention is a
seemingly innocuous function strlen(), which you've mentioned before.
Suppose strlen() is said to allow tainted input, since afterall
what's the harm. One simple exploit leading to information disclosure
is to pass it an array() causing the function to generate an error
exposing the script's path.
>> Good luck, I suppose on a base level it is entertaining seeing
>> someone bang their head against the wall time and time again.
>
> You could enjoy the entertainment or you could bring some tools and
> help bring the wall down :) Whatever you heart desires.
Standing next to a wall and trying to knock down its foundations is a
rather dangerous endeavor :)
Ilia Alshanetsky
I think this would make a great addition. Just not for educational
purposes, but also to help experienced developers avoid missing holes.
Ilia wrote:
> safe_mode is/was off by default to, and yet a good chunk of hosts
> enabled it under the assumption it would make their setups secure.
What about making it something that has do be enabled explicitly
during runtime? People are used to to doing session_start(); on the
top of every page so for those who wants it doing
set_taint_detection(TRUE); or something equivalent wouldn't be that
much of a problem.
Even if there would be a ini option for it, I believe this will make a
bit of a different impression. First of all this would just add
security for the actual php developers. It won't really have an impact
on server security in well configured environment. Second, a lot of
applications would break, and that wouldn't be very popular among the
users. Third, I think documentation means a lot here. If it's made
sure that it's not painted out to be something it's not one can surely
avoid such a issue.
> I think people generally follow the path of least resistance and for
> compat purposes I suspect most application will simply create a quick
> wrapper to **untaint** all input data so they can use it within their
> application without worrying about tainting getting in the way.
But if it's not enabled by default it wouldn't make any sense to
enable it just to by-pass it.
> All it means that this breaks every applications and the security
> benefits are somewhat ambiguous, but in all fairness the full
> consequences are hard to predict without sample code.
Still, bc isn't an issue if it isn't enabled by default.
>> I'd say you have pretty weird code if you do include $_POST
>> ['VAR']; and yet people do exactly that.
>
> And if we had tainting, people would know it's bad, and would know
> why. :)
>
>> Bitmask identifying different taint modes.
>
> Can you elaborate which modes do you propose?
Well, for one you would need to identify different escape methods for
different data uses, so let's make a quick (and incomplete) list:
1) Command execution validation
2) Safe for command execution parameter validation
3) Output to screen validation
4) Database validation (you'd need one bit for every database, since
special chars in one db do not equate to another)
Here is another problem because some DBs like PostgreSQL require
different treatment of binary and text data, you are going to have an
interesting problem, since escaping binary data will corrupt it.
5) Safe for HTTP headers validation
6) Safe to pass to include/eval/etc...
Ilia Alshanetsky
> OK: so there may be a few cases where it won't work, that does not
> mean
> that there won't be great advantages for the majority of situations.
I'd wager there would be more then a few instances, but without code
its kinda hard to demonstrate.
>> All it means that this breaks every applications and the security
>> benefits are somewhat ambiguous, but in all fairness the full
>> consequences are hard to predict without sample code.
>
> It is OFF by default.
> RegisterGlobals was initially ON by detault since loosing it broke
> a lot of code. PHP survived that.
Because an even more convenient replacement in a form of super-
globals was introduced.
You mean when running with display_errors = on? Ouch.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
I have a couple of things to say, the first being to reiterate other's
point of view where the tainting feature would be a tool to help
determine locations of suspect code. Obviously it won't make your code
any more safer, but if you enable it and react to the feedback from
warnings/errors then you WILL have more secure code. False positives
suck, and for such cases I would suggest allowing a developer to easily
untaint data that they feel/know in their (preferrably) expert opinion
is clean. For instance:
(untaint)$someVar;
I think it should be a language level construct for optimal speed.
Additionally I think there should be a (taint)$someVar construct also so
that APIs can specifically taint return values that they know should be
tainted but that might have been untainted during whatever processing
they performed.
> Consider E_NOTICE, it is a superb tool for finding out things like un-
> declared variables (which often cause all manner of exploits), and
> yet most developers keep it off because it gets in a way, even though
> it has 0 false positives. However I suppose it is simpler and
> ultimately harmless to do $value = (int)$_POST['value']; without
> checking if $_POST['value'] exists via isset.
I think the tainting system would be a complimentary system to E_NOTICE
and E_WARNING. Both of these are useless if you don't pay attention to
the messages generated. The same would be true of the tainting system.
You don't get anything for free, you need to actively handle such
issues. This brings me to another issue...
Since tainting is about helping the developer to improve the security of
their code I think that it should be configurable for PHP_INI_ALL. This
would send a clear message to hosting companies that this does not make
their PHP installations more secure.
Additionally I think that there should be three possible settings for
such a tainting mode.
tainting = on
tainting = off
tainting = promiscuous
The first two are obvious. The last would essentially work the same as
tainting being enabled with the exception that scripts will work as
though it was turned off but warnings/errors will show up in error log.
This will allow hosting companies to enable it by default without
adversely affecting scripts.
Lastly, is this something that can be turned off at the compile level so
that the checks for tainting being enabled are completely skipped.
Sometimes you just want to eke out the extra speed in a production
environment.
+1 for tainting (from null karma source :)
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
You mean "most of the servers that allow strangers to read their
phpinfo()"? I'm not surprised. You think if they expose their phpinfo
you can make it worse by seeing script path in error message?
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
You need a malicous code writer to have an exploit. As far as I
know, PHP is not a platform for secuerly executing hostile code.
> You are going
> to need to deal with different levels of taint-untainted and 1 bit is
> not going to give you that flexibility. You are going to need an int/
> long, maybe even a long long.
Sandboxing malicious code requires a lot more than taint levels.
I'll be happy to provide that, but it's outside of the contribution
that I'm trying to make for 2007. Right now I am merely targeting
the non-malicious programmers.
Wuietse
First, a nitpik:
pg_fetch_row() for a long time gave a false positive (imho) about
seeking past the end of the result set.
To this day I type @pg_fetch_row() as a matter of course, even though
I think this maybe got fixed... :-)
REAL CONTENT:
I think that "taint" might be useful to some developers.
Perhaps it would be best to review the proposed changes for
performance effects, and see how much difference it really makes to
add a bit-flag to every zval, and what other effects taint has with it
turned OFF.
The penalties for turning it ON in performance are a non-issue, I
should think.
If Wietse has a working prototype patch to do it, shouldn't we (an
editorial we, there) at least give it a test spin?
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
Jordan
I think the presumption is that most developers smart enough to be
using mysql_real_escape_string() is PROBABLY smart enough to not just
blindly pass its output to exec().
Or, more likely, that most developers smart enough to use
mysql_real_escape_string() will code more like:
$input = $_REQUEST['input']; //tainted
$input_sql = mysql_real_escape_string($input); //untainted
$input_exec = escapeshellarg($input); //untainted
exec("/whatever $input"); //E_ERROR
Ooops, I made a typo!
So "taint" will only catch, say, 99% of the naive scripters who don't
filter data, and an occasional typo for the experts.
Nobody is claiming it will catch this one:
$input = $_REQUEST['input']; //tainted
$input_sql = mysql_real_escape_string($input); //untainted
$input_exec = escapeshellarg($input); //untainted
exec("/whatever $input_sql"); //false negative, oh well
Still sounds like it's worth considering, if the performance and
maintainability penalties are not too high.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
Would that require some sort of bogus string non-manipulation, or do
you foresee a "untaint" function which does nothing but mark it
untainted, or are you thinking a new operator?...
I think we've run out of ASCII symbols for operators, actually, so
don't pick that one. :-)
Well, duh, we labeled it "safe mode"
It didn't make their server any slower.
And it's "safe"
So it must be good, right?
:-) :-) :-)
Perhaps the biggest "win" for "taint" not being another "safe mode" is
that it's correctly labeled in the first place, unlike "safe mode"
:-)
>> I think having such a mode would be of great help to many users
>> though. It
>> would allow them to find any quirks in their data/input filtering
>> and help
>> them focus on where they should do a better job.
>
> I think people generally follow the path of least resistance and for
> compat purposes I suspect most application will simply create a quick
> wrapper to **untaint** all input data so they can use it within their
> application without worrying about tainting getting in the way.
Sure, some will.
Just as some folks are now using extract on $_REQUEST to work around
register_globals being off.
>> Of course no one such
>> solution will solve application-level security issues but it could
>> end up
>> being a very useful tool which helps people address the majority of
>> such
>> problems.
>
> Tainting may have helped if you are using straight forward data, but
> what happens when you start doing appending or passing data through
> string modification operations. I almost guarantee that there will be
> a series of operations or function calls that if executed in a
> certain order will mask tainted input making it appear safe. I'd
> gladly provide you with examples, but there is no sample code, if I
> have time I'll take a peak at Perl and Ruby that implement tainting
> and see if a quick taint bypass can be devised.
I'll save you the trouble.
A simple preg that does NOTHING at all will mark the data as
"untainted" in Perl.
"taint" is not a magic bullet.
It's just a big yellow triangle "WARNING" sign for a new developer,
and a handy tool for the experienced to run through one of many safety
checks in their development/review process.
>> I've been on the fence here from the beginning. I am pretty sure
>> this would
>> be a very useful feature but I'm somewhat paranoid after the whole
>> safe_mode
>> saga. Stas pointing out the difference between safe_mode which is
>> "trust
>> everyone unless explicitly pointed out" vs. a possible approach
>> here of
>> "trust noone unless explicitly pointed out" might make the
>> difference
>> though.
>
> All it means that this breaks every applications and the security
> benefits are somewhat ambiguous, but in all fairness the full
> consequences are hard to predict without sample code.
It breaks nothing unless somebody somewhere makes the decision that
turning "taint" on is worth the support calls/questions they will get
from developers who have no [bleep]ing clue what they are doing.
I guess a taint mode would give users the impression that the data is
safe while it isn't safe.
As an example: $data = mysql_real_escape_sring($_GET['data']);
query(with $data); echo "Saved $data"; For the first part the data is
safe and the escape removes the taint flag. Having the taint mode
enabled and annoyed by the warnings the user is happy having no warning
in the second case, thinking "hey, there's no warning - I don't need to
care anymore" - but still having a security issue.
Having systems with a lot higher complexity people might tend to just
check whether they get taint-warnings or not and assume all other data
safe. Knowing the PHP users I think it's simpler to teach them "all data
is unsafe" instead of teaching "well, the taint mode gives you an idea
which data is unsafe but still: All data is unsafe".
I think a taint mode leads to the same kind of problems which lead to
the safe_mode removal for PHP 6.
johannes
> > And here is your first exploit, let's say we say
> > mysql_real_escape_string() takes tainted data and makes it
> > untainted, what happens when this "safe" data is passed to
> > exec(). You are going to need to deal with different levels
> > of taint-untainted and 1 bit is not going to give you that
> > flexibility. You are going to need an int/ long, maybe even a
> > long long.
> >
> > Ilia Alshanetsky
Presumably the filtering functions "untaint" the data.
The whole point of "taint" is to catch data that is getting passed
*RAW* to places raw data shouldn't get passed.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
That accommodation consists of:
Filter your input with some kind of reasonable filtering function.
Is that really too much to ask?...
If something as simple as mysql_real_escape_string(), a typecast, or a
preg_replace() marks it untainted, it seems to me like you'd have to
write some REALLY BAD CODE and distribute it to have a problem.
Maybe we could use (tainted) and (untainted) typecasts as suggested
elsewhere in this thread.
For my proof-of-concept tests I implemented taint(), untaint() and
is_tainted() extensions so that I could quickly set up a testbed.
But if there is a better user interface then I have no problem.
It's too early to say exactly when I will be able to share a first
set of diffs with the community, but I'm sure it will be taken care
of.
Wietse
>
>> Something that most servers do (almost 80% by recent stats).
>> http://www.nexen.net/images/stories/phpinfos/display_errors.png
>
> You mean "most of the servers that allow strangers to read their
> phpinfo()"? I'm not surprised. You think if they expose their
> phpinfo you can make it worse by seeing script path in error message?
It is not just the phpinfo() servers, it is very much a common case I
assure you.
Ilia Alshanetsky
Correctly implemented filtering library would untaint the data, of
course. One of the TODOs might be providing API making easier to write
such library.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
Well, people leaving such things in their servers should deal with it
first, then get to talk about real security :) No solution can help a
person who deliberately configures his server wide open. We are talking
about people that _try_ to do it secure and we may help them. For those
who even doesn't try, well...
> -----Original Message-----
> From: Wietse Venema [mailto:wie...@porcupine.org]
> Sent: Friday, December 15, 2006 6:02 PM
> To: PHP internals
> Subject: Re: [PHP-DEV] Run-time taint support proposal
>
> Richard Lynch:
> > On Fri, December 15, 2006 4:31 pm, Wietse Venema wrote:
> > > Even if some taint check is to restrictive at some point in time,
> > > the programmer can always overcome it with an explicit action.
> >
> > Would that require some sort of bogus string
> non-manipulation, or do
> > you foresee a "untaint" function which does nothing but mark it
> > untainted, or are you thinking a new operator?...
> >
> > I think we've run out of ASCII symbols for operators, actually, so
> > don't pick that one. :-)
>
> Maybe we could use (tainted) and (untainted) typecasts as
> suggested elsewhere in this thread.
>
> For my proof-of-concept tests I implemented taint(), untaint() and
> is_tainted() extensions so that I could quickly set up a testbed.
> But if there is a better user interface then I have no problem.
>
> It's too early to say exactly when I will be able to share a
> first set of diffs with the community, but I'm sure it will
> be taken care of.
>
> Wietse
>
In that case do we really need something clogging up the code base?
Improving the performance of tools like PHPEclipse would seem to me to
be a better use of resources than adding the same sort of checks into
the runtime engine?
--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php
I don't quite understand the relevance of PHPEclipse to the issue.
And I'm not sure how you judge "clogging up" PHP without seeing a patch
especially as I'm not sure how much PHP internals hacking you've done.
Andi
> -----Original Message-----
> From: Lester Caine [mailto:les...@lsces.co.uk]
> Sent: Friday, December 15, 2006 11:31 PM
> To: PHP internals
> Subject: Re: [PHP-DEV] Run-time taint support proposal
>
> I don't quite understand the relevance of PHPEclipse to the issue.
> And I'm not sure how you judge "clogging up" PHP without seeing a patch
> especially as I'm not sure how much PHP internals hacking you've done.
I'm sure many people have their own preferred tools for creating files -
all I was trying to say was that - is taint support actually needed at
run time? Something that improves the visibility of mistakes while
editing files seems to be more worthwhile - something that can also be
switched to report compliance with other things like 'strict' before you
actually RUN a script?
Building everything directly into the runtime just seems overkill and
while I would be the first person NOT to want to have to compile before
running, some tools that assist in the development process could be
useful. Eclipse provides a tidy framework to develop in and is free so
seems the logical starting point - especially since Wietse is from IBM.
Firstly, run-time taint could be used or appropriately extended to be
used to provide such information in your bloatware environment of choice.
Secondly, anything that could within reason have the same qualities as
run-time taint would in fact have to be run-time taint by any other
name. What do you intend to do, write an AI to analyse all the code
flawlessly without running it and cover all logical paths? Nobody cares
about taint in a hello world without any conditionals or flow control or
complexity at all. Taint is useful for large amounts of code and that
would be very hard to cover with static analysis.
Taint could be very useful as an informative tool for development and
review. It doesn't have to be implemented to always run or even be
compiled into your production installs. Please do not limit it with your
terror of including useful features.
> In theory, you need to consider that many ISPs and users will interpret
> taint mode == secure and enable it causing much grief to distributable
> application writers who need to accommodate every environment.
actually i dont think this is a valid argument. people will quickly
figure out what is going on from the error messages. and isp's will be
quick to learn this lesson. as a result it will more likely be disabled
everywhere. but its an important tool for all the people that manage
their own servers. as a result it will probably not educate the masses
(well they can use it on their dev boxes), but more provide yet another
tool for those of us who are working on larger projects.
regards,
Lukas
To me some of Ilia's arguements do not make sense. Ext/filter has the
same danger of creating a false sense of security. The arguements that
did make sense to me are about the issue of (un)tainting being directly
tied to the context in which the variable is being used.
This is a problem that many escaping frameworks face. The classic is
defining a javascript variable versus html inside a template. Johannes
mentioned the database versus logging etc.
The only "solution" I was able to think about quickly was to define a
best practice so to teach people to do the untainting as close to the
variable use as possible. Actually it might even be wise to not even
store the untainted version of the variable at all.
$query = "select * FROM foo where bar
='".mysqli_real_escape_string($lala)."'"
echo magic_escaper($lala);
However I guess a lot of people love an ultra layered approach, which
does tend to result in black box code, where nobody knows what is
actually being escaped or not. So I guess this development style is
likely dangerous anyways.
Ummm, something like taint checking would be available regardless of
your favourite code editor (mine is joe). I don't see what PHPEclipse or
any specific editor has to do with the evolution of PHP.
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
> To me some of Ilia's arguements do not make sense. Ext/filter has the
> same danger of creating a false sense of security. The arguements that
> did make sense to me are about the issue of (un)tainting being directly
> tied to the context in which the variable is being used.
>
> This is a problem that many escaping frameworks face. The classic is
> defining a javascript variable versus html inside a template. Johannes
> mentioned the database versus logging etc.
>
> The only "solution" I was able to think about quickly was to define a
> best practice so to teach people to do the untainting as close to the
> variable use as possible. Actually it might even be wise to not even
> store the untainted version of the variable at all.
>
> $query = "select * FROM foo where bar
> ='".mysqli_real_escape_string($lala)."'"
>
> echo magic_escaper($lala);
>
> However I guess a lot of people love an ultra layered approach, which
> does tend to result in black box code, where nobody knows what is
> actually being escaped or not. So I guess this development style is
> likely dangerous anyways.
Of cause many of us never go near the raw database calls anyway, since
we are using frameworks that carry out lot of the security checks at a
generic level - so I see little point adding more checks at a level that
major projects do not use anyway?
--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php
--
Database calls are only one aspect. Many of us use input in many other
ways also. Also, using a framework just hides from many what may have
been missed by the developer(s) of the framework. Also, regardless of
whether YOU go near the raw database, someone did for you, and they
could benefit from the taint feature during the implementation of their
generification of what you use.
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
Beginning of this year I was actually making tests with something like
that but I used
zend_uchar is_ref:1;
zend_uchar flags:7;
to be able to support multiple taint types (HTML and DB where the main
targets). And my tainting was the other way around: Everything not
marked with a flag is unsafe.
The scope of my approach was smaller than what you are proposing: The
runtime engine only does very little with the flags, the actual work is
done by user land code. So there is very little overhead.
My goal was to enhance our HTML generation toolkit to be able to trap
missing htmlspecialchar()s.
Let me outline what I did:
- Variables assigned from constants are marked safe
- All other variables inherit the flags of the right hand side
- String concatenation does an AND of all flags for the result
- Added var_setflag($var, $flag) and var_getflag($var, $flag) functions
My HTML-Generator then used
...
if (!(var_getflag($result, 1))
error_log(...);
var_setflag($result, 1);
return $result;
}
This could certainly be enhanced (I tried various function names and
semantics but wasn't 100% happy with any of them), e.g. by making some
of the string functions aware of flags. And it leaves most of the work
to the toolkit developer. It doesn't try to provide any safety per se,
only a mechanism to write safe(r) toolkits.
Unfortunately I didn't get around to fully implement it (yet) but this
approach looked promising while being simple.
- Chris
>> It is not just the phpinfo() servers, it is very much a common
>> case I assure you.
>
> Well, people leaving such things in their servers should deal with
> it first, then get to talk about real security :)
You seem to be ignoring the argument and clinging to a false
assumption that only people with open phpinfo()s have disable_errors
enabled. I guarantee you that is not the case for the most part.
> No solution can help a person who deliberately configures his
> server wide open.
Accidentally leaving phpinfo(), is wide open? I suppose if I were to
demonstrate a vulnerability on zend.com it would imply Zend does not
care about security?
> We are talking about people that _try_ to do it secure and we may
> help them.
You're not helping them, just making assumptions about how their code
should work and making them adhere to them.
Ilia Alshanetsky
Well, there's little we can do in that part except for educating users
and changing defaults. The problem is not unique to PHP of course - I
have seen JSP and ASP error messages on most sensitive sites with paths
etc. so many times. But that's entirely unrelated problem.
>> No solution can help a person who deliberately configures his server
>> wide open.
>
> Accidentally leaving phpinfo(), is wide open? I suppose if I were to
If you consider exposing script file name a problem, on that scale
having phpinfo() available to google is wide open indeed.
> demonstrate a vulnerability on zend.com it would imply Zend does not
> care about security?
If you know of vulnerability on zend.com, please write to
webm...@zend.com, that would be only responsible course of action.
However, I do not see how having vulnerabilities imply not caring for
security.
> You're not helping them, just making assumptions about how their code
> should work and making them adhere to them.
Yes, and this is helping. Every language does that. Saying "you can't
make 100% work exactly as I wanted without any effort, so entire thing
isn't even worth discussing" is a road nowhere. There's a lot of places
it would be helpful, and there's a lot of places it won't - and that's ok.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
> If you know of vulnerability on zend.com, please write to
> webm...@zend.com, that would be only responsible course of
> action. However, I do not see how having vulnerabilities imply not
> caring for security.
That's my point (and for record previous exploits in the Zend site
were reported several times) just because a mistake was made does not
mean you don't care about security. Same logic must apply to phpinfo
(), someone created it for debugging and forgot to remove and the
search engine stumbled across it. It happens.
>> You're not helping them, just making assumptions about how their
>> code should work and making them adhere to them.
>
> Yes, and this is helping. Every language does that. Saying "you
> can't make 100% work exactly as I wanted without any effort, so
> entire thing isn't even worth discussing" is a road nowhere.
> There's a lot of places it would be helpful, and there's a lot of
> places it won't - and that's ok.
I am saying that you should not try to outsmart the developer because
you assume you know best.
Ilia Alshanetsky
OK, I was overreaching. But main point stays - problems of configuration
are rarely solvable by automatic means, rather by education and choosing
better defaults. If you run site in debug configuration, there's little
we can do - debug configuration is _supposed_ to reveal information.
However, there's a bunch we could do about errors of omission - e.g.
people just not doing stuff which they should do because they forget or
didn't check their code thorough enough.
That is as if we had switch that says "production mode" which could
filter out all info that could be potentially dangerous, etc. - it would
help such phpinfo() people, if we solve chicken and egg problem of
having them to actually turn the switch on :) BTW, may be an idea to
think about too :)
> I am saying that you should not try to outsmart the developer because
> you assume you know best.
Well, if you don't take me personally - I certainly don't - but
collective judgment of the PHP group - that to some measure that is the
way, we try to guess what developers need and steer the language
accordingly. There's no way of not doing it - you always make choices to
do or not to do certain feature and how to do it.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
--
Because some of us don't use the bloated frameworks, often because
those who develop the bloated frameworks didn't do filtering properly,
perhaps because they didn't have a taint mode to notify them that they
were writing sub-standard code.
:-) :-) :-)
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
The annoying thing is that PHP seems to be becoming the bloatware. PHP4,
PHP5 incompatible versions, PHP6. Perhaps it would be nice to have a
PHPLite that we can work with and add just the bits we need rather than
having to manage updates which on the main add nothing to the
functionality that we are actually using? Having to keep testing and
changing stable frameworks because they are no longer PC is becoming a
full time operation and distracting from improving the operation of
actual code. I've not fully tested 5.2 yet because of lack of time -
taint may tell me where things NOW need to be changed but it's yet
another "You *WILL* do it this way" :(
--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php
--
./configure --disable-all
// Tom
If I only had to support my own servers .....
The problem is ISP's keep uploading the latest official releases and
then we have to fix the faults fast :(
PHP is a *SERVICE* that other people use and that service keeps getting
broken - saying "Build your own" has no relevance what so ever :(
Heck this is why PHP4 will never die - and I never used that.
--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php
--
See other post :(
Major versions though do not traditionally have a mandate of backward
compatibility. This is why PHP4 still receives bug/security fixes. PHP5
is a leap forward, and while I'll admit that I'm not keen on some of
it's OOP adoptions I would never expect it to work 100% with PHP4. At
some point for advancement, one needs to discard compatibility so the
appropriate steps forward can be made... that is the purpose of a major
version change.
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
Ignoring the fact that this is somewhat off-topic, why would ISPs use the
Lite version as opposed to the "bloated" version? Their users want
features, functions, they want PHP - why settle for the lesser version?
If you don't want taint support, because you feel it's bloat, do
--without-taint or disable it run-time (?)
Personally I'd love taint support, it'd make me feel ten times safer when
I code - knowing I didn't output tainted data, that I might output
wrongfully untainted data, well that's my problem.
// Tom
Ilia,
Why are we outsmarting developers? Security bugs are out there, in
fact in web apps they're pretty much a plague (irregardless of the
language). Does it mean that some developers aren't smart and many
are not properly informed? Absolutely YES - that's the world we live
in... Given that, and the likelihood it'd only get worse (more and
more people are programming the web with less and less training) -
whatever we can provide in the direction of creating better apps can help.
My 2c on this piece is that tainting can be a nice helper tool to
reduce the likelihood of security problems in your code. Nothing
more and nothing less.
I too fear the possibility of tainting becoming the new
safe_mode. "Outsource your security to PHP, it'll take care of
it". But I think there's a way of both designing and pitching
tainting so that we avoid this false perception. If we pitch
tainting as a development-time only tool the points out a certain
class of security mistakes, and is by no means an invisible magnetic
shield that actually protects you from them - then I think it can be
quite useful.
As such, I would consider:
- Saying tainting should not be enabled in production (avoid the
false sense of security people might have if they turn on tainting in
production).
- Not necessarily the fastest possible implementation, since it'd be
used for development purposes only.
- Consider making this a compile time option with significant
overhead and a big DO NOT ENABLE IN PRODUCTION, so that people have
an even clearer idea they shouldn't rely on it to find their bugs,
and that in fact it's just a helper tool, not unlike a strong IDE.
We could possibly even come up with a new name other than tainting so
that there is not prior perception as to what this feature is
supposed or not supposed to do.
Zeev
> As such, I would consider:
> - Saying tainting should not be enabled in production (avoid the false
> sense of security people might have if they turn on tainting in
> production).
> - Not necessarily the fastest possible implementation, since it'd be
> used for development purposes only.
> - Consider making this a compile time option with significant overhead
> and a big DO NOT ENABLE IN PRODUCTION, so that people have an even
> clearer idea they shouldn't rely on it to find their bugs, and that in
> fact it's just a helper tool, not unlike a strong IDE.
>
> We could possibly even come up with a new name other than tainting so
> that there is not prior perception as to what this feature is supposed
> or not supposed to do.
Now that puts my own concern into the right light!
IPS's should never be running it?
--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php
--
My concern with taint is that ISP's WILL switch it on in a mistaken
belief that it will help security. It's not simply a matter of *I* can
build it with or without these things. People are using MY stuff with
other ISP's and if it will not work *I* am the one who gets hassled to
fix it - and I've had enough of that already with PHP5 updates!
There have been suggestions about extra configuration .ini's and the
like, but personally I see this as an area where the TOOLS we are
developing with need the improved checking. Keeping them in line with
all the extras being bolted into PHP5 is bad enough. Can't we nail down
PHP5 and look at this topic as part of the PHP6 jump. Alternatively
taint is a module that has to be installed separately to a standard PHP5
update?
Ummm, wouldn't it be nice to have the option without taking a great big
artificial overhead penalty for having it enabled? I mean, I for one,
and definitely you for two, cannot possibly expect to catch every single
logic path in an application, and as we've already determined a simple
generic untaint is going to be pointless. It would be best to untaint as
close to the usage of the data as possible. Thus I think E_TAINT or
something being generated in a production environment is perfectly
acceptable in the same sense that E_NOTICE is.
>
> We could possibly even come up with a new name other than tainting so
> that there is not prior perception as to what this feature is
> supposed or not supposed to do.
blighting :)
<?php
exec( $_GET['foo'] );
?>
>>> A blight is upon you in /path/to/source/foo.php on line 1
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
> - Consider making this a compile time option with significant
> overhead and a big DO NOT ENABLE IN PRODUCTION, so that people have
> an even clearer idea they shouldn't rely on it to find their bugs,
> and that in fact it's just a helper tool, not unlike a strong IDE.
Could it possibly wrapped up in the --enable-debug option??
> We could possibly even come up with a new name other than tainting so
> that there is not prior perception as to what this feature is
> supposed or not supposed to do.
Fainting?
K
--
"Democracy is two wolves and a lamb voting on what to have for lunch.
Liberty is a well-armed lamb contesting the vote."
We're not debugging the PHP internals are we?
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
Following up on an earlier suggestion in this thread, I could see
at least three modes of operation:
1) Disabled. The default setting.
2) Audit mode. Report perceived problems to logfile. This can be
used by developers to catch bugs, and by deployers for quality
assessment (but developers please don't start screaming yet).
3) Enforcement mode. Don't allow execution past a perceived problem.
It won't come as a surprise that I will try very hard to reduce
time and space overhead, so that taint checks can be used in
production environments (both modes 2 and 3 above).
As for positioning the feature, I don't think the problem is with
the name "taint" itself. It was introduced under this name with
Perl3 in 1989(*), and later under the same name in Ruby and other
programming languages. I am not aware that shortcomings in taint
support have ever been implicated as a cause for security
vulnerabilities (but I'm always willing to be corrected). That's
17 years of past experience speaking.
I do agree that it is important to build support by explaining what
taint support does. The primary purpose of the tool is to help
catch a common class of programming error: unchecked inputs in
shell/sql commands or other sensitive operations. With a pessimistic
taint propagation approach, there will unavoidably be false positives;
those false positives will have to be reduced to a practical level.
Some expressed concern that the tool would empower me and others
to force a particular programming idiom onto developers (you shall
code the way I want to). This is unlikely with the simple
black-and-white taint support as proposed initally, where untaint
is a broad operation that marks data as good for multiple contexts.
I think it's more likely to be a problem with narrow untaint
operators: untaint for HTML can be done only with a limited set of
untaint operators, and untaint for SQL can be done only with another
limited set of untaint operators, and so on. This would raise many
false alarms in existing code that is not vulnerable. I'm concerned
that this would not win over the hearts and minds of many.
As long as we don't overreach (try to stop every problem) and
oversell (promise it will stop every problem) then we should be
fine, if 17 years of past experience can be applied to PHP.
Wietse
(*) http://mirrors.develooper.com/perl/really-ancient-perls/oldperl/dist/dex/perl/3.000-3.044/kit3.000/
> As for positioning the feature, I don't think the problem is with
> the name "taint" itself. It was introduced under this name with
> Perl3 in 1989(*), and later under the same name in Ruby and other
> programming languages. I am not aware that shortcomings in taint
> support have ever been implicated as a cause for security
> vulnerabilities (but I'm always willing to be corrected). That's
> 17 years of past experience speaking.
Google search seems to be of a differing opinion, but then it has
only been around for 10 years ;-)
> I do agree that it is important to build support by explaining what
> taint support does. The primary purpose of the tool is to help
> catch a common class of programming error: unchecked inputs in
> shell/sql commands or other sensitive operations. With a pessimistic
> taint propagation approach, there will unavoidably be false positives;
> those false positives will have to be reduced to a practical level.
Bottom line is that does not, there are plenty of Perl application
supposedly safe from XSS due to tainting while in reality are
trivially exploitable via XSS due to the fact validation regex which
does the un-tainting of data is sub-par. Your interpretation of how
the tool is position seems to be out of touch with reality, I can
guarantee you that people will assume that code that works with
taining is safe, which could not be further from the truth.
> Some expressed concern that the tool would empower me and others
> to force a particular programming idiom onto developers (you shall
> code the way I want to). This is unlikely with the simple
> black-and-white taint support as proposed initally, where untaint
> is a broad operation that marks data as good for multiple contexts.
Wrong again, different contexts have different validation criteria,
unless you consider that tainting in PHP wont work. What's safe to
print on screen may not be safe to execute or pass to the database
etc...
> As long as we don't overreach (try to stop every problem) and
> oversell (promise it will stop every problem) then we should be
> fine, if 17 years of past experience can be applied to PHP.
If you base everything on experience there is no need to use PHP
period. Stick to predictable C, Fortran, etc...
Just because a person is a great train engineer does not make him a
great car mechanic.
Ilia Alshanetsky
> ...
> Bottom line is that does not, there are plenty of Perl application
> supposedly safe from XSS due to tainting while in reality are
> trivially exploitable via XSS due to the fact validation regex which
> does the un-tainting of data is sub-par. Your interpretation of how
> the tool is position seems to be out of touch with reality, I can
> guarantee you that people will assume that code that works with
> taining is safe, which could not be further from the truth.
I am sorry - it is you Ilia who is out of touch with reality.
You seem to have taken dislike to Wietse's excellent suggestion and
have fought it with a barrage of half baked objections.
It is quite true that a taint flag cannot *guarantee* to make a PHP script
completely safe. Using a regex to untaint a value will not guarantee that
you end up with a perfectly safe value -- partly because it depends on what
you want to do with it.
The point is that most PHP programmers are not completely stupid, agreed many
could be better experienced. But they can all read the following health warning:
Untainting is only as good as the check that is used.
Let us be done with this discussion and agree (as the Perl & Ruby people have)
that it is best to have a useful tool even if we can't make it 100% perfect.
> >As long as we don't overreach (try to stop every problem) and
> >oversell (promise it will stop every problem) then we should be
> >fine, if 17 years of past experience can be applied to PHP.
>
> If you base everything on experience there is no need to use PHP
> period. Stick to predictable C, Fortran, etc...
> Just because a person is a great train engineer does not make him a
> great car mechanic.
Your reply is completely out of tune with Wietse's comment.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
Regex is the approach used by Perl to un-taint data, which is why I
chose to mention it. The problem I am trying to show you that you
seem to be stead-fast ignoring is that php variables are often used
in different contexts within the scope of the same script. There are
numerous applications where data would be escaped for MySQL output
and then the same data printed to screen. Given that mysql escaping
function would un-taint the data, no taint errors will be raised when
the same data is printed to screen. Which means while you code maybe
safe against SQL injection (not really though, due to charset tricks
with MySQL) it will definitely not be safe against XSS. It is my
opinion is that a false sense of security is far worse then knowing
your code may potentially have security holes.
> The point is that most PHP programmers are not completely stupid,
> agreed many
> could be better experienced. But they can all read the following
> health warning:
>
> Untainting is only as good as the check that is used.
>
> Let us be done with this discussion and agree (as the Perl & Ruby
> people have)
> that it is best to have a useful tool even if we can't make it 100%
> perfect.
So you propose to give a partially working tool that promises data
security and then expect people not to rely on it 100% because it is
easy to
Ilia Alshanetsky
PHP-DEV already has a black/white taint mode it seems; Ilia won't see
the value of things unless they are entirely untainted and perfect in
all respects.
:) :) :)
In the same way that a fire extinguisher sitting on the wall provides a
degree of security even though it can't be used to extinguish a full
blown fire.
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
On 12/19/06, Wietse Venema <wie...@porcupine.org> wrote:
> Zeev Suraski:
> Following up on an earlier suggestion in this thread, I could see
> at least three modes of operation:
>
> 1) Disabled. The default setting.
>
> 2) Audit mode. Report perceived problems to logfile. This can be
> used by developers to catch bugs, and by deployers for quality
> assessment (but developers please don't start screaming yet).
>
> 3) Enforcement mode. Don't allow execution past a perceived problem.
I do not think a taint mode is a good thing however to reject this
need would be a mistake. But there is a huge difference between a
taint mode for the developers or the audit team and something that
_will_ be enabled in many ISP, an enforcement mode. I'm "strongly"
opposed to add this mode.
I fought years against safe_mode, I'm not goint to start again with a
taint mode. One can say that an enforced taint mode will be better
than safe_mode but he is lying to himself. It will be a horrible
moving target. ISP will active it while keeping all other sources of
troubles, leaving the mess to the developers.
that's what happened with safe_mode mixed with all possible craps and
that will happen with the mode #3 as well without solving anything
from a security point of view (users will have more logs to read, at
least ;-).
As a short answer, I completely agree with Zeev. Many users ask and/or
need a taint mode (or whatever is its name). I do not think it is a
good thing but PHP should have it, only for development/audit purposes
and disabled by default.
(that said, I will be really happier without taint mode).
I still wonder which miracles you can achieve to provide such mode
without shooting your own feet :-)
--Pierre
Can happen in Perl/... as well.
> numerous applications where data would be escaped for MySQL output
> and then the same data printed to screen. Given that mysql escaping
> function would un-taint the data, no taint errors will be raised when
> the same data is printed to screen. Which means while you code maybe
True. That is why I tell the people who I teach PHP to to do any escaping/...
at the very last moment, then you know *how* to escape it.
But very often I see people who do no data validation at all, or very little.
They don't even check that a numeric field consists entirely digits.
We can help them be aware of this sort of thing, this raises their general
awareness that fields may contain ''nasties''.
> safe against SQL injection (not really though, due to charset tricks
> with MySQL) it will definitely not be safe against XSS. It is my
> opinion is that a false sense of security is far worse then knowing
> your code may potentially have security holes.
I wear a seat belt when I drive my car because it will help me in many small
accidents. I do not leave it off because it will be useless if a large
truck decides to run me down.
> So you propose to give a partially working tool that promises data
> security and then expect people not to rely on it 100% because it is
> easy to
I propose to give a partially working tool that helps in the majority
of cases. I am aware that it will not be a panacea but that it is preferable
to nothing.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
--
In a previous post Pierre I suggested to prevent ISPs from thinking this
does anything for them that it should be configurable via PHP_INI_ALL.
So even if an ISP enabled mode 3, a developer can set it themself right
in their source code. Since taint checking is necessarily a run time
problem, there's no reason not to allow disabling it via ini_set(). This
would ensure that developers would never be put up against the wall of
frustration so often encountered by safe mode restrictions.
Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
On 12/19/06, Robert Cummings <rob...@interjinn.com> wrote:
> On Tue, 2006-12-19 at 17:35 +0100, Pierre wrote:
> > Hello,
> >
> > On 12/19/06, Wietse Venema <wie...@porcupine.org> wrote:
> > > Zeev Suraski:
> >
> > > Following up on an earlier suggestion in this thread, I could see
> > > at least three modes of operation:
> > >
> > > 1) Disabled. The default setting.
> > >
> > > 2) Audit mode. Report perceived problems to logfile. This can be
> > > used by developers to catch bugs, and by deployers for quality
> > > assessment (but developers please don't start screaming yet).
> > >
> > > 3) Enforcement mode. Don't allow execution past a perceived problem.
> >
> > I do not think a taint mode is a good thing however to reject this
> > need would be a mistake. But there is a huge difference between a
> > taint mode for the developers or the audit team and something that
> > _will_ be enabled in many ISP, an enforcement mode. I'm "strongly"
> > opposed to add this mode.
>
> In a previous post Pierre I suggested to prevent ISPs from thinking this
> does anything for them that it should be configurable via PHP_INI_ALL.
> So even if an ISP enabled mode 3, a developer can set it themself right
> in their source code. Since taint checking is necessarily a run time
> problem, there's no reason not to allow disabling it via ini_set(). This
> would ensure that developers would never be put up against the wall of
> frustration so often encountered by safe mode restrictions.
"disable_functions ini_set, error_reporting" and you are out. And yes,
I saw that... The only way to prevent a mode 3 is to do not implement
it.
> On Tue, Dec 19, 2006 at 11:18:02AM -0500, Ilia Alshanetsky wrote:
>>
>> On 19-Dec-06, at 11:06 AM, Alain Williams wrote:
>>> It is quite true that a taint flag cannot *guarantee* to make a PHP
>>> script
>>> completely safe. Using a regex to untaint a value will not
>>> guarantee that
>>> you end up with a perfectly safe value -- partly because it depends
>>> on what
>>> you want to do with it.
>>
>> Regex is the approach used by Perl to un-taint data, which is why I
>> chose to mention it. The problem I am trying to show you that you
>> seem to be stead-fast ignoring is that php variables are often used
>> in different contexts within the scope of the same script. There are
>
> Can happen in Perl/... as well.
I know that, which means the taint mode in Perl is woefully
inadequate solution that offers very little security. So, why try to
reproduce the bad idea in PHP as well? Isn't the idea to borrow the
best features and learn from other people's mistakes?
>> safe against SQL injection (not really though, due to charset tricks
>> with MySQL) it will definitely not be safe against XSS. It is my
>> opinion is that a false sense of security is far worse then knowing
>> your code may potentially have security holes.
>
> I wear a seat belt when I drive my car because it will help me in
> many small
> accidents. I do not leave it off because it will be useless if a large
> truck decides to run me down.
To use your car analogy and safe_mode history, most users will start
driving like maniacs, violating every traffic law thinking that the
seat belt makes them invincible.
>> So you propose to give a partially working tool that promises data
>> security and then expect people not to rely on it 100% because it is
>> easy to
>
> I propose to give a partially working tool that helps in the majority
> of cases. I am aware that it will not be a panacea but that it is
> preferable
> to nothing.
Ilia Alshanetsky
> I propose to give a partially working tool that helps in the majority
> of cases. I am aware that it will not be a panacea but that it is preferable
> to nothing.
A non context aware taint will fail in the majority of use cases.
regards,
Lukas
> Wrong again, different contexts have different validation criteria,
> unless you consider that tainting in PHP wont work. What's safe to print
> on screen may not be safe to execute or pass to the database etc...
Ilia is right here, this is the key concern with this proposal for me at
least. Every PHP app beyond hello world will likely at least work in 2
different context, as such I think a black and white approach is not a
useful intermediate step for this.
>> As long as we don't overreach (try to stop every problem) and
>> oversell (promise it will stop every problem) then we should be
>> fine, if 17 years of past experience can be applied to PHP.
>
> If you base everything on experience there is no need to use PHP period.
> Stick to predictable C, Fortran, etc...
> Just because a person is a great train engineer does not make him a
> great car mechanic.
I am not following you here Ilia. Your comparison does not make sense to
me at least. The goal must be to make to create a tool to make it easier
to write more secure code. No more, no less. It is something you will
run in development in order to pick up a subset of security issues. It
will of course fail if there are security measures, which turn out to be
insufficient. So no it will not magically make your security filtering
regexp more secure, but it will catch the cases where you missed
security checking entirely.
Again handling different context seems critical to me. So if we have
that, then it will also help in finding the slightly more tricky to find
issue of where a variable has been filtered/sanitized but for the wrong
context. However if we do have context sensitive taint, it seems like it
will increase the development/maintenance scope even more. And it will
also have a bigger performance overhead.
As such I am beginning to realize that at least from my current
understanding the Ruby taint model is simply insufficient. While it has
different taint levels, they are not concerned with the context, but
only with the scope of the limits applied. I do not know how things are
in Perl's taint model. Does anyone have context sensitive tainting
implemented yet?
You are thinking about it from the wrong end.
The point of taint checking is to remind the programmer to check that all
input fields pass suitable checks for whatever the field is supposed to be;
thus: age is numeric, sex is 'm' or 'f', price matches \d+\.\d{2} etc.
Many fields will have more complicated validation than that.
These fields can then be operated on directly, eg:
$_GET['age'] > 21
will not fail because the age is 'slkfjslfkj'.
Some fields may then need to be 'sent' somewhere, eg inserted into a database
or output to the next web page. In that case passing the field through
mysql_escape_string() or htmlspecialchars() may be needed.
Tainting is designed to remind that input validation has not been done. There
was a suggestion of checking that it was suitable to be 'sent' to the appropriate
'output stream' but that was rejected as too complicated.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
--
> Bottom line is that does not, there are plenty of Perl application
> supposedly safe from XSS due to tainting while in reality are
> trivially exploitable via XSS due to the fact validation regex which
> does the un-tainting of data is sub-par.
If you incorrectly untaint data, how is that worse than not knowing
that there was a tainted data path in your code in the first place?
The perfect is the enemy of the good. I think we can all agree that
tainting can never be perfect. The question is it better than what we
have now?
Relying on developers to correctly sanitize data every time and every
place is certainly not perfect. Even the best and smartest developers
make mistakes sometimes. Isn't this one of the problems with safe
mode? I think the purpose of a taint mode should be to help find
existing security problems, not to try to prevent security violations
as with safe mode. Giving developers a dynamic security auditing tool
is not a bad thing.
I think the real problem with a taint mode is the issue of false
positives. It could end up being more annoying that the value it
provides. But then, until its implemented, who can know?
I do think that once taint mode is enabled, there should be no way to
turn it off or get around it, except to untaint the data. It should be
fatal. No E_TAINT. No half measures. A program should either pass
the "taint audit" or not pass, but it shouldn't be able to opt out of
the audit with an ini_set, an .htaccess file, or an error handler.
Tainting should not be a mode you run in, it should be an (imperfect)
audit that you pass or fail.
A staged, conservative rollout as strictly an auditing feature might
not even include an untaint or an is_tainted function. If you want to
prevent ISPs from putting this in production, that would be one way to
do it. Without untaint or is_tainted, tainting become strictly a
debugging tool. If it turned out a bad idea, it would also be easily
revokable. (Maybe is_tainted and untaint start out in PECL and the zval
flag, function flags and fatal error only go into the core with an
--enable configuration option?)
I think that a tainting audit mode could be a useful tool, provided
that the number of false positives turns out to be reasonable. Marking
functions as sensitive or such is certainly useful. I cringe at the
idea of another global mode to deal with, another ini setting or
another error_reporting level.
Regardless of whether this is included in PHP 6, I hope someone
implements it. I want to run my programs with tainting and see what
happens.
Best Regards,
Jeff
It does not need to be perfect to be useful. But without it being
context aware it will simply fail in too many cases imho to be really
useful. However it will add code to php that needs to be maintained,
documented, it will potentially have a performance impact.
Some people are also concerned about a false sense of security. While I
think we are all aware that perfect security is not attainable in a
realistic IT environment, I at least partially agree with the sentiment
a taint without context will find you some issues, but is likely to
overlook so many cases, that the benefit comes quite limited compared to
what people would expect.
regards,
Lukas
I do not think the purpose of tainting is or should be to take this kind
of decisions. The purpose of tainting is to force you - as application
developer (as much as it is possible without actually looking over your
shoulder and grabbing your hands :) to take this decision and not just
forget about it. Many bugs are there not because somebody uses an input
sanitized for exec as database query - people usually don't do that
(though I can not say there's no app doing it - but I can certainly say
it's not a routine way of action). However people routinely forget to
take action on the input altogether - either because they delay it "for
later" and then forget or because they are unfamiliar with the necessity
of filtering and potential danger of passing the data. While there's no
way to stop the user who purposefully wants to pass user-supplied data
to dangerous function without filtering, there's certainly a way to warn
a user that does it by mistake. That's the target.
--
Stanislav Malyshev, Zend Products Engineer
st...@zend.com http://www.zend.com/
Very well put/explained.
--
Alain Williams
Parliament Hill Computers Ltd.
Linux Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
#include <std_disclaimer.h>
--