On Sun, Nov 13, 2011 at 1:48 PM, Pol <p...@everpix.net> wrote:
> Hi,
>
> Since switching to 2.7 runtime, logging in to http://www.everpix.com
> went from about a second to anywhere from 15s to 60s. I tracked it
> down to this single password checking line:
>
> from bcrypt import bcrypt
> bcrypt.hashpw(password, self.password_hash) == self.password_hash
What value are you using for "threadsafe" in your app.yaml?
How large is self.password_hash?
Cheers,
Brian
> This comes from "a native Python implementation of the py-bcrypt
> package from http://www.mindrot.org/projects/py-bcrypt/" grabbed from
> here: https://github.com/erlichmen/py-bcrypt.
>
> So what's happening here and how can we fix this?
>
> Thanks,
>
> - Pol
>
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>
Or, if you just do something weird to the password BEFORE you SHA it. Like
interleave the user name in the password, Salt1 + UpSaEsRsNwAoMrEd + Salt2
Or Pick 2 Hash's SHA(pass) + Md5(pass)
Don't want to store all that string length? Odd Characters from
Sha(Pass+salt) + Even Characters from MD5(Pass+Salt)
Uniqueness of the method is more important than the method.
Thanks for getting back to me.
On Sun, Nov 13, 2011 at 4:50 PM, Pol <p...@everpix.net> wrote:
> Hi Brian,
>
> threadsafe is "true"
There is a known issue where concurrent requests (enabled with
threadsafe) can be much slower than non-concurrent requests,
especially if the request is CPU-bound.
You might want to set threadsafe to "false" and see if that fixes the problem.
Cheers,
Brian
Thanks for trying it again with threadsafe disabled and reporting back!
On Mon, Nov 14, 2011 at 4:32 AM, Pol <p...@everpix.net> wrote:
> Hi Brian,
>
> It helps, it's now at ~5s instead, but still at least 2x slower than
> on the 2.5 runtime.
>
> So on December 1st, the 50% discount for front-instances is gone. The
> idea is to compensate by switching to Python 2.7 with multithreading,
> but it looks like at this point it's a lose-lose situation: it runs
> more requests at the same time, but they take longer. We're mid-
> November already, do you guys think you'll have all of this working
> perfectly within 2 weeks?
No, the issues with concurrent requests won't be fixed by the end of November.
But note that concurrent requests will *not* improve the utilization
of CPU-bound requests. Running multiple threads on the same CPU just
proportionally slows each thread down.
> The 1.6 SDK which actually allows to run 2.7
> locally was just released, so we're only starting to test now. Seems
> to me deferring the payment change 1 more month to January 1st 2012
> wouldn't hurt :)
You can send your billing-related suggestions to App Engine's
Engineering Directory, Peter Magnusson (p...@google.com).
Thanks!
Cheers,
Brian
On Tue, Nov 15, 2011 at 3:19 AM, Pol <p...@everpix.net> wrote:
> Hi Brian,
>
>> > So on December 1st, the 50% discount for front-instances is gone. The
>> > idea is to compensate by switching to Python 2.7 with multithreading,
>> > but it looks like at this point it's a lose-lose situation: it runs
>> > more requests at the same time, but they take longer. We're mid-
>> > November already, do you guys think you'll have all of this working
>> > perfectly within 2 weeks?
>>
>> No, the issues with concurrent requests won't be fixed by the end of November.
>>
>> But note that concurrent requests will *not* improve the utilization
>> of CPU-bound requests. Running multiple threads on the same CPU just
>> proportionally slows each thread down.
>
> That doesn't make sense: apps do a mix of CPU stuff and RPC stuff (and
> possibly URL requests). What's the points of concurrent requests if it
> slows down the CPU stuff while allowing to parallelize your RPC calls?
This pattern (a mix of CPU use and RPC calls) will benefit from
concurrent requests. I was writing about what I understood to be your
login example.
Presumably it does a single datastore read to access user information
(taking 40ms of so) and then spends 1 seconds doing cryptography.
> The end result will be the same number of instance as requests end up
> taking longer. Isn't the scheduler supposed to watch all this and
> ensure the CPU on each physical machine is not saturated?
>
> Only apps that do long poll URL requests and barely use the CPU would
> benefit of concurrent requests then.
>
> We were told: don't worry so much about hours-based pricing, just wait
> for 2.7 runtime, it'll have concurrent requests, it'll compensate
> things. Clearly that doesn't work as promised if just turning
> threadsafe ON makes a 2 seconds requests turn into a 30-60 seconds
> one: the scheduler is not doing the right thing.
Yes, these large latency increases are a bug:
http://code.google.com/p/googleappengine/issues/detail?id=6323
> It seems what you need is a per WSGIApplication instance control of
> the concurrent setting instead of global one, so you can turn it on
> only where it makes sense.
>
> Finally, no matter what, concurrent or not, there's still a problem as
> 2.7 runtime appears slower than 2.5 in this simple empirical test. I'm
> starting to suspect you are using the 2.7 transition as a opportunity
> to run more virtual instances per physical machine.
That's not the case. The Python 2.7 runtime is slower than the Python
2.5 runtime in some cases and faster in others. We aren't publicizing
the reasons why at this point.
Cheers,
Brian
> - Pierre
-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of Pol
Sent: Sunday, November 13, 2011 9:32 AM
To: Google App Engine
Subject: [google-appengine] Re: Help resolve massive performance regression
in 2.7 vs 2.5 runtime
If I know your salt, I can “De-Hash” bcrypts faster than I can any of the “weird” combinations. Because there are libraries for doing so on ATI cards.
If you do something weird a script kiddie can’t just pull code off the web and attack it.
You want to see who can offline crack a set of 1M users? Your bcrypt list vs my “Weird” You don’t even have to give me the salt I’ll have 10k of those cracked in the first 72 hours. 10 to 1 odds you won’t get through mine without my source code in my life time.
-Brandon Wirtz
PS
I don’t usually do the “trust me I’m far more evil” but FBI, Homeland Security, and the CIA have been to my doorstep for things I have defeated, documented, or built to keep from being defeated. The first time I was in 3rd grade.
Nick,
I agree, that my threat model assumes they didn’t get my source code. That “Somebody else’s problem” works under the assumption people are going to get my data, not my source code because I don’t ever write my own DB server code I am stuck using someone else’s which means the vulnerability that I am most likely to face is that somebody else’s screw up will be where my problem lies.
Granted this is a better strategy if you are running compiled code, since my code lives on the Google Server I’m at the mercy of Google’s Security, where as if I were running compiled code it would be less likely someone would get the code.
I would say that unique salt per user, is a good thing. The most common way to attack a large password database is to look at the most common entries and compare against the most common passwords from other sources. If you know the 15 most used passwords and the 15 most often occurring database results you are a long ways towards knowing what those 15 values are and calculating the salt. You aren’t crunching millions of combinations you are crunching 1000’s and once you have the salt, you take your already deciphered list of the most common passwords and you calculate the top 5k using bcrypt and you now have about 50% of the data in fewer than 10k operations.
Compare that with my scenario. You have data. You don’t have the source code. The UserID or other “spoiler” is in every salt so the reoccurrence of a hash doesn’t correspond to a duplicate password, and now the computation is nearly impossible even if you have the source code, because you have to calculate every value for every user anyway.
Would Brcypt(Pass+UserID+Salt) be the best? Yes. But MD5(Pass+UserID+Salt) is going to still going to be orders of magnitude more difficult than Bcrypt(Pass+salt), because I can’t use knowledge of frequency tables to predict likely outcomes or detect duplicate passwords.
-Brandon
> But note that concurrent requests will *not* improve the utilizationThat doesn't make sense: apps do a mix of CPU stuff and RPC stuff (and
> of CPU-bound requests. Running multiple threads on the same CPU just
> proportionally slows each thread down.
possibly URL requests). What's the points of concurrent requests if it
slows down the CPU stuff while allowing to parallelize your RPC calls?
The end result will be the same number of instance as requests end up
taking longer. Isn't the scheduler supposed to watch all this and
ensure the CPU on each physical machine is not saturated?
Only apps that do long poll URL requests and barely use the CPU would
benefit of concurrent requests then.
This is likely related to this bug:
http://code.google.com/p/googleappengine/issues/detail?id=6323
Though I'm a bit surprised that it is impacting you this much since
you don't seem to be using much CPU.
Cheers,
Brian