Using crc32's for cache file names

38 views
Skip to first unread message

Thue Janus Kristensen

unread,
Dec 26, 2009, 4:04:39 PM12/26/09
to smarty-d...@googlegroups.com
I note that Smarty 3 is using crc32 for creating template cache file names. For example in sysplugins/smarty_internal_resource_file.php :
 $_filepath = (string)abs(crc32($_template->getTemplateFilepath()));

Is there any check that a fetched cached file was compiled for the template we are trying to get the cache for? I did not see any. The following assumes there is no check.

That is an unfortunate choice for 2 reasons:

1) There could be an accidental collision, since the set of crc32s is obviously smaller than the set of path strings. This would obviously be bad, unexpected by the user, and perhaps hard to debug.
2) If an attacker could control some component of the path where a template would be written (somewhat reasonable in an automated system). Let us say that the attacker want to attack a template at /a/b/t.tpl. If the attacker could control one component of the path of templates he creates then he could choose a path /a/c which had the same crc32 as /a/b, and he could then create a template /a/c/t.tpl . Now, if /a/b/t.tpl was already cached then the attacker could now read that cache. If /a/b/t.tpl was not cached, then the attacker could cause the cache to be created with the content of his own /a/c/t.tpl , which would then be displayed/executed the next time /a/b/t.tpl was requested.

crc32 is not designed to make it hard to find hash collisions. We could use a cryptographic hash such as sha1, but this still runs the small unnecessary risk of collisions.

You could add some text to the top of the cached file which told the template path, but it would be be more elegant to just completely eliminate any risk of collisions.

A good solution would be to use

function path_to_string($path) {
    return str_replace(Array('/', '+'), Array('_', '-'), base64_encode($path));
}

everywhere where smarty 3 currently uses crc32.

See http://en.wikipedia.org/wiki/Base64 . I choose to replace / and +, to make the generated filenames compatible with all file systems, according to http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitations .

This could create longer file names. But again according to http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitations, all remotely modern filesystems support 255 byte filenames, so this should not be a problem in practice. The length of base64 is about 4/3 times the length of the input string. If it is a problem then it could be solved

On a related note:
On line 45 of smarty_internal_compile_include.php , you do a preg_replace on the compiled template. The compiled template can contain arbitrary strings, so this is very ugly! IMO it should be strictly impossibly to write any string in a template which can be interpreted by Smarty as smarty internal syntax. This is not least a good security practice, since that kind of mixing contexts is very hard to reason about when trying to understand code. This is related to the problem I have with the ?><?php substitution on the compiled template - I really believe arbitrary strings should work.

On a further related note:
On line 47 of the same file, if the expected header of the compiled file is not found then you choose to just use the compiled file. In that case the only correct thing to do is to recompile!

Regards, Thue

Monte Ohrt

unread,
Dec 26, 2009, 6:45:23 PM12/26/09
to smarty-d...@googlegroups.com
Thue,

The choice of CRC32 was for a few reasons:

1) available to all PHP versions (although not such an issue any more
since Smarty 3 is now PHP5+)
2) Performance. These hashes happen on every execution, so performance
is very important. CRC32 is 10x faster than MD5(). I think Uwe is going
to run a benchmark for base64_encode, but again I think CRC32 will win
out in speed.

As for collision, although it is possible, it is very unlikely. In 8
years of Smarty 2 we have never encountered a collision issue with
anyone using Smarty. So, CRC32 has been a fast and valid choice for
filepath creation.

3) shortness, keeping filenames readable/manageable. base64_encode could
get very long.

I'm open to other solutions that avoid collisions, are as fast or
faster, and keep filenames manageable. So far CRC32 has been the choice.

Monte

> --
>
> You received this message because you are subscribed to the Google
> Groups "Smarty Developers" group.
> To post to this group, send email to smarty-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> smarty-develop...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/smarty-developers?hl=en.

Thue Janus Kristensen

unread,
Dec 26, 2009, 7:40:26 PM12/26/09
to smarty-d...@googlegroups.com
The best solution is to save the cached files in a directory structure mirroring the template files themselves. IMO.

That would also be user friendly, as it would be more obvious where to find the cached file, given the location of the original template.

Regards, Thue

uwe.tews

unread,
Dec 26, 2009, 7:57:02 PM12/26/09
to Smarty Developers
That works for subfolders of template_dir. But template_dir itself can
be an array folders so we still need unique id's.

> > Seehttp://en.wikipedia.org/wiki/Base64. I choose to replace / and +, to


> > make the generated filenames compatible with all file systems, according to

> >http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitat....


>
> > This could create longer file names. But again according to

> >http://en.wikipedia.org/wiki/Filename#Comparison_of_file_name_limitat...,

Thue Janus Kristensen

unread,
Dec 27, 2009, 2:20:19 PM12/27/09
to smarty-d...@googlegroups.com
Hmm. Then the only solution I can see is to save the filepath to the first line of the compiled template / cached output, and then check that it matches the expected filename when loading it.

Regards, Thue

Monte Ohrt

unread,
Dec 27, 2009, 2:45:54 PM12/27/09
to smarty-d...@googlegroups.com
.. a runtime overhead we can't justify.

Thue Janus Kristensen wrote:
> Hmm. Then the only solution I can see is to save the filepath to the
> first line of the compiled template / cached output, and then check
> that it matches the expected filename when loading it.
>
> Regards, Thue
>
> On Sun, Dec 27, 2009 at 1:57 AM, uwe.tews <uwe....@googlemail.com
> <mailto:uwe....@googlemail.com>> wrote:
>
> That works for subfolders of template_dir. But template_dir itself can
> be an array folders so we still need unique id's.
>
> On Dec 27, 1:40 am, Thue Janus Kristensen <thu...@gmail.com

> <mailto:thu...@gmail.com>> wrote:
> > The best solution is to save the cached files in a directory
> structure
> > mirroring the template files themselves. IMO.
> >
> > That would also be user friendly, as it would be more obvious
> where to find
> > the cached file, given the location of the original template.
> >
> > Regards, Thue
> >
> > On Sat, Dec 26, 2009 at 10:04 PM, Thue Janus Kristensen

> <thu...@gmail.com <mailto:thu...@gmail.com>>wrote:

> <http://en.wikipedia.org/wiki/Base64>. I choose to replace / and +, to

> <mailto:smarty-d...@googlegroups.com>.


> To unsubscribe from this group, send email to
> smarty-develop...@googlegroups.com

> <mailto:smarty-developers%2Bunsu...@googlegroups.com>.

Thue Janus Kristensen

unread,
Dec 27, 2009, 2:48:58 PM12/27/09
to smarty-d...@googlegroups.com
Well, you need to do something :). The current lack of collision detection is obviously unacceptable.

And is it so unjustifiable? it takes almost no work to fetch out the first line of an already loaded file and compare it to an expected value. Or perhaps make the first line be a<?php assert that $template_path ="...."?>, that should be very fast too. I just don't see the big performance problem.

Regards, Thue

Monte Ohrt

unread,
Dec 27, 2009, 6:17:49 PM12/27/09
to smarty-d...@googlegroups.com
I would like to see a working exploit before we cry wolf. Template
paths are not created from external sources, so exploiting a duplicate
CRC32 would be extremely difficult, and even if it could, what purpose
would it gain outside of faulting an existing template? You can't inject
your own source just by duplicating a file path.

At first I was thinking this meant loading/reading a file as well as the
file timestamp check. If this is merely an assertion test added to the
top, I think it would be an acceptable integrity test.

Thue Janus Kristensen wrote:
> Well, you need to do something :). The current lack of collision
> detection is obviously unacceptable.
>
> And is it so unjustifiable? it takes almost no work to fetch out the
> first line of an already loaded file and compare it to an expected
> value. Or perhaps make the first line be a<?php assert that
> $template_path ="...."?>, that should be very fast too. I just don't
> see the big performance problem.
>
> Regards, Thue
>
> On Sun, Dec 27, 2009 at 8:45 PM, Monte Ohrt <mo...@ohrt.com
> <mailto:mo...@ohrt.com>> wrote:
>
> .. a runtime overhead we can't justify.
>
> Thue Janus Kristensen wrote:
> > Hmm. Then the only solution I can see is to save the filepath to the
> > first line of the compiled template / cached output, and then check
> > that it matches the expected filename when loading it.
> >
> > Regards, Thue
> >
> > On Sun, Dec 27, 2009 at 1:57 AM, uwe.tews
> <uwe....@googlemail.com <mailto:uwe....@googlemail.com>

> > <mailto:uwe....@googlemail.com


> <mailto:uwe....@googlemail.com>>> wrote:
> >
> > That works for subfolders of template_dir. But template_dir
> itself can
> > be an array folders so we still need unique id's.
> >
> > On Dec 27, 1:40 am, Thue Janus Kristensen <thu...@gmail.com
> <mailto:thu...@gmail.com>

> > <mailto:thu...@gmail.com <mailto:thu...@gmail.com>>> wrote:
> > > The best solution is to save the cached files in a directory
> > structure
> > > mirroring the template files themselves. IMO.
> > >
> > > That would also be user friendly, as it would be more obvious
> > where to find
> > > the cached file, given the location of the original template.
> > >
> > > Regards, Thue
> > >
> > > On Sat, Dec 26, 2009 at 10:04 PM, Thue Janus Kristensen
> > <thu...@gmail.com <mailto:thu...@gmail.com>

> <mailto:thu...@gmail.com <mailto:thu...@gmail.com>>>wrote:

> > <mailto:smarty-d...@googlegroups.com


> <mailto:smarty-d...@googlegroups.com>>.
> > To unsubscribe from this group, send email to
> > smarty-develop...@googlegroups.com
> <mailto:smarty-developers%2Bunsu...@googlegroups.com>

> > <mailto:smarty-developers%2Bunsu...@googlegroups.com
> <mailto:smarty-developers%252Buns...@googlegroups.com>>.

Monte Ohrt

unread,
Dec 27, 2009, 6:20:52 PM12/27/09
to smarty-d...@googlegroups.com
Thue Janus Kristensen wrote:
> Well, you need to do something :). The current lack of collision
> detection is obviously unacceptable.
>
> And is it so unjustifiable? it takes almost no work to fetch out the
> first line of an already loaded file and compare it to an expected
> value. Or perhaps make the first line be a<?php assert that
> $template_path ="...."?>, that should be very fast too. I just don't
> see the big performance problem.

Also to note in this academic case, if CRC32 was duplicated, the
assertion test may not exist in the exploited file, so an external
assertion test may be required.

>
> Regards, Thue
>
> On Sun, Dec 27, 2009 at 8:45 PM, Monte Ohrt <mo...@ohrt.com
> <mailto:mo...@ohrt.com>> wrote:
>
> .. a runtime overhead we can't justify.
>
> Thue Janus Kristensen wrote:
> > Hmm. Then the only solution I can see is to save the filepath to the
> > first line of the compiled template / cached output, and then check
> > that it matches the expected filename when loading it.
> >
> > Regards, Thue
> >
> > On Sun, Dec 27, 2009 at 1:57 AM, uwe.tews
> <uwe....@googlemail.com <mailto:uwe....@googlemail.com>

> > <mailto:uwe....@googlemail.com


> <mailto:uwe....@googlemail.com>>> wrote:
> >
> > That works for subfolders of template_dir. But template_dir
> itself can
> > be an array folders so we still need unique id's.
> >
> > On Dec 27, 1:40 am, Thue Janus Kristensen <thu...@gmail.com
> <mailto:thu...@gmail.com>

> > <mailto:thu...@gmail.com <mailto:thu...@gmail.com>>> wrote:
> > > The best solution is to save the cached files in a directory
> > structure
> > > mirroring the template files themselves. IMO.
> > >
> > > That would also be user friendly, as it would be more obvious
> > where to find
> > > the cached file, given the location of the original template.
> > >
> > > Regards, Thue
> > >
> > > On Sat, Dec 26, 2009 at 10:04 PM, Thue Janus Kristensen
> > <thu...@gmail.com <mailto:thu...@gmail.com>

> <mailto:thu...@gmail.com <mailto:thu...@gmail.com>>>wrote:

> > <mailto:smarty-d...@googlegroups.com


> <mailto:smarty-d...@googlegroups.com>>.
> > To unsubscribe from this group, send email to
> > smarty-develop...@googlegroups.com
> <mailto:smarty-developers%2Bunsu...@googlegroups.com>

> > <mailto:smarty-developers%2Bunsu...@googlegroups.com
> <mailto:smarty-developers%252Buns...@googlegroups.com>>.

Thue Janus Kristensen

unread,
Dec 29, 2009, 12:49:19 PM12/29/09
to smarty-d...@googlegroups.com
I tested it in Smarty 3. For example using the crc32 collision

/home/monte/templates/a_subdir/t.tpl
/home/tjk/templates/zaaaaarfh0/t.tpl

I was able to replace a page by using another template with the same filename (t.tpl here), and a path with the same crc32 sum, by planting my own crc32-constructed file in the cache.

If for example a university allow their students to have a webpage on their server in their ~/templates directory, with smarty security turned on make it supposedly secure, using such a crc32 collision would allow one to get your own page displayed instead of another students. Or if there were access restrictions on who could view the pages, such a collission would allow the attacker to view the targeted page.

Regards, Thue

Monte Ohrt

unread,
Dec 29, 2009, 1:17:28 PM12/29/09
to smarty-d...@googlegroups.com
This is assuming you have filesystem access to create these paths, so
you can already view whatever files you want anyways :)

I see the academic case you present, and I am open to alternates to
crc32 if we can keep generated files manageable and not take a
performance hit.

> > > > > 2) If an attacker could control some co mponent of

> <mailto:smarty-developers%2Bunsu...@googlegroups.com
> <mailto:smarty-developers%252Buns...@googlegroups.com>
> > <mailto:smarty-developers%252Buns...@googlegroups.com
> <mailto:smarty-developers%25252Bun...@googlegroups.com>>>.

Thue Janus Kristensen

unread,
Dec 29, 2009, 1:25:25 PM12/29/09
to smarty-d...@googlegroups.com
On Tue, Dec 29, 2009 at 7:17 PM, Monte Ohrt <mo...@ohrt.com> wrote:
This is assuming you have filesystem access to create these paths, so
you can already view whatever files you want anyways :)

Not true. In Unix, you often don't have read access to other people's home directories, for example.

But you can still potentially guess the path of the template you want to replace. For example if there is a set location of a person's entry page.

I see the academic case you present, and I am open to alternates to
crc32 if we can keep generated files manageable and not take a
performance hit.

crc32 is fine, since collisions are so rare. We just need to detect and handle the collisions.

Regards, Thue

John Campbell

unread,
Dec 29, 2009, 2:11:13 PM12/29/09
to smarty-d...@googlegroups.com
On Sat, Dec 26, 2009 at 6:45 PM, Monte Ohrt <mo...@ohrt.com> wrote:
> 2) Performance. These hashes happen on every execution, so performance
> is very important. CRC32 is 10x faster than MD5(). I think Uwe is going
> to run a benchmark for base64_encode, but again I think CRC32 will win
> out in speed.

My tests shows using the built-in md5 will be faster than the crc32
call as implemented.

CODE:
$start = microtime(1);
$iterations = 100000;
while($iterations--)
$hash = 'this/is/typicalstring.tpl';
printf("nothing time: %.3f \n",microtime(1)-$start);

$start = microtime(1);
$iterations = 100000;
while($iterations--)
$hash = crc32('this/is/typicalstring.tpl');
printf("crc32 time: %.3f \n",microtime(1)-$start);

$start = microtime(1);
$iterations = 100000;
while($iterations--)
$hash = (string)abs(crc32('this/is/typicalstring.tpl'));
printf("crc32 as implemented time: %.3f \n",microtime(1)-$start);

$start = microtime(1);
$iterations = 100000;
while($iterations--)
$hash = md5('this/is/typicalstring.tpl');
printf("md5 time: %.3f \n",microtime(1)-$start);

$start = microtime(1);
$iterations = 100000;
while($iterations--)
$hash = sha1('this/is/typicalstring.tpl');
printf("sha1 time: %.3f \n",microtime(1)-$start);

RESULT:
nothing time: 0.009
crc32 time: 0.033
crc32 as implemented time: 0.072
md5 time: 0.058
sha1 time: 0.068

Monte Ohrt

unread,
Dec 29, 2009, 2:22:34 PM12/29/09
to smarty-d...@googlegroups.com
Good observation. It seems that the (string) cast does slow things up
quite a bit. This may not be necessary, as it is not used everywhere in
the code that crc32() is called. I'll take a closer look.

Thue Janus Kristensen

unread,
Dec 29, 2009, 2:24:17 PM12/29/09
to smarty-d...@googlegroups.com
Perhaps worth noting is that the longest time, 0.068 seconds for 100000 calls to sha1, corresponds to 6.8 × 10E-7s for one call to sha1.

So assuming one sha1 use per page served, after one million pages served you will have used a total of less than 1 second of server time on calculating sha1 hashes.

So what I conclude from this is the hash calculation that part of the code is not performance critical, so we might as well use sha1.

(Real run time might be slightly longer due to cache misses, but still)

Regards, Thue

Monte Ohrt

unread,
Dec 29, 2009, 2:41:26 PM12/29/09
to smarty-d...@googlegroups.com
If sha1 avoids the extra cross-check for collisions at compile time, I'd
vote to go with it as this will be faster overall. md5() could also work
and is slightly quicker than sha1(), but not as secure from collisions
as md5 is a shorter bit length.

> <mailto:smarty-d...@googlegroups.com>.


> To unsubscribe from this group, send email to
> smarty-develop...@googlegroups.com

> <mailto:smarty-developers%2Bunsu...@googlegroups.com>.

Thue Janus Kristensen

unread,
Dec 29, 2009, 2:54:41 PM12/29/09
to smarty-d...@googlegroups.com
Personally I would tend to go with both sha1 and the run-time check, just to be ridiculously secure :).

But using sha1 only and dropping the run-time check is also a reasonable choice.

Regards, Thue

On Tue, Dec 29, 2009 at 8:41 PM, Monte Ohrt <mo...@ohrt.com> wrote:
If sha1 avoids the extra cross-check for collisions at compile time, I'd
vote to go with it as this will be faster overall. md5() could also work
and is slightly quicker than sha1(), but not as secure from collisions
as md5 is a shorter bit length.

Thue Janus Kristensen wrote:
> Perhaps worth noting is that the longest time, 0.068 seconds for
> 100000 calls to sha1, corresponds to 6.8 в 10E-7s for one call to sha1.

uwe.tews

unread,
Dec 29, 2009, 3:14:18 PM12/29/09
to Smarty Developers
The SVN is updated with sha1() now.

On Dec 29, 8:54 pm, Thue Janus Kristensen <thu...@gmail.com> wrote:
> Personally I would tend to go with both sha1 and the run-time check, just to
> be ridiculously secure :).
>
> But using sha1 only and dropping the run-time check is also a reasonable
> choice.
>
> Regards, Thue
>
> On Tue, Dec 29, 2009 at 8:41 PM, Monte Ohrt <mo...@ohrt.com> wrote:
> > If sha1 avoids the extra cross-check for collisions at compile time, I'd
> > vote to go with it as this will be faster overall. md5() could also work
> > and is slightly quicker than sha1(), but not as secure from collisions
> > as md5 is a shorter bit length.
>
> > Thue Janus Kristensen wrote:
> > > Perhaps worth noting is that the longest time, 0.068 seconds for
> > > 100000 calls to sha1, corresponds to 6.8 в 10E-7s for one call to sha1.
>
> > > So assuming one sha1 use per page served, after one million pages
> > > served you will have used a total of less than 1 second of server time
> > > on calculating sha1 hashes.
>
> > > So what I conclude from this is the hash calculation that part of the
> > > code is not performance critical, so we might as well use sha1.
>
> > > (Real run time might be slightly longer due to cache misses, but still)
>
> > > Regards, Thue
>

> > >     smarty-develop...@googlegroups.com<smarty-developers%2Bunsu...@googlegroups.com>
> > >     <mailto:smarty-developers%2Bunsu...@googlegroups.com<smarty-developers%252Buns...@googlegroups.com>


> > >.
> > >     For more options, visit this group at
> > >    http://groups.google.com/group/smarty-developers?hl=en.
>
> > > --
>
> > > You received this message because you are subscribed to the Google
> > > Groups "Smarty Developers" group.
> > > To post to this group, send email to smarty-d...@googlegroups.com.
> > > To unsubscribe from this group, send email to

> > > smarty-develop...@googlegroups.com<smarty-developers%2Bunsu...@googlegroups.com>


> > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/smarty-developers?hl=en.
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "Smarty Developers" group.
> > To post to this group, send email to smarty-d...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > smarty-develop...@googlegroups.com<smarty-developers%2Bunsu...@googlegroups.com>

John Campbell

unread,
Dec 29, 2009, 10:45:19 PM12/29/09
to smarty-d...@googlegroups.com
2009/12/29 Thue Janus Kristensen <thu...@gmail.com>:

> Personally I would tend to go with both sha1 and the run-time check, just to
> be ridiculously secure :).

The odds of a random collision are something like 1 in 10^30. Which
is about the same odds as all the oxygen molecules in your room
randomly migrating to one side of the room and you suffocating to
death. It theoretically could happen, but I wouldn't worry about it
:)

-John Campbell

Thue Janus Kristensen

unread,
Dec 30, 2009, 7:27:44 AM12/30/09
to smarty-d...@googlegroups.com
Now I think you are exaggerating a bit. There is 1.65*10^25 molecules in a cubit meter of air. Of these, 1/5 is oxygen, or about 3.3*10^24 oxygen molecules. In my room there is about 50 cubic meters, so about 60*3.3*10^24=2*10^26 oxygen molecules. For each of them to be on the same side by chance is 1/2^(2*10^26) or very roughly one in 10^(10^25).

So the probability of all the oxygen molecules being on the same side of the room is closer to 1 in 10^(10^25)=10^10000000000000000000000000 than 1 in 10^30. So your estimate is a bit off :).

But the reason for my paranoia is that the 1 in 10^30 claim for sha1 is simply not true. Nobody knows what the probability of finding a security hole in sha1 is; the security of it is "nobody has found a hole yet, but we can't prove there isn't one". Perhaps they will make a quantum computer tomorrow which can break it in seconds. Hence my preference for actually checking for collisions, when you in this case don't actually have to depend on the security of sha1.

Regards, Thue

Reply all
Reply to author
Forward
0 new messages