"profile could not be opened" plan

Evan Martin

unread,

Aug 19, 2010, 6:29:11 PM8/19/10

to Markus Gutschke, David Pizzuto, chromium-dev

On Thu, Aug 19, 2010 at 11:40 AM, Markus Gutschke <mar...@chromium.org> wrote:
> Given the number of reports that we have seen from Googlers, this actually
> appears to be a relatively common problem for users who have their profiles
> on NFS. It looks very much like a buggy NFS implementation, but that doesn't
> do us any good. We still need to figure out a way to deal with it.

From this thread, it seems like turning off locking completely won't
work, as we rely on it.

Some more ideas.

1) Sledgehammer: refuse to run on NFS. Aside from this bug, network
file systems hurt performance in all sorts of unexpected ways.
(See e.g. "https connections are very slow to establish (5-30
seconds)" http://code.google.com/p/chromium/issues/detail?id=48585 )
This would force some small fraction of our users to manually modify
their profiles to be on local disk, which is pretty brutal.
On the other hand, the sort of user who has an NFS homedir is more
likely the sort of user who can understand how to do this.

Probably a non-starter but it is tempting. I hate that for some of
our users we just have slow+unreliable performance without them having
any way of knowing the reason (or that it is in fact a problem).
Here's a real unprompted quote from a coworker on this subject: "I
just moved Chrome's profile directory and oh my God is it zippy. I
never knew..."

2) Attempt to clean up locks when we detect we're in a hosed state,
either automatically at startup or by amending our --diagnostics mode
to do the above unlocking manually (and have our "your profile is
corrupt" dialog recommend running in --diagnostics mode). This would
also allow us to recommend moving the profile (and NSS database, for
that matter) to local disk at that point.

2a) Using the existing locking mechanism: if you're on NFS,
recursively copy your profile to a temporary location; delete your
profile; move the copy back to the original location. If you're on an
another file system, we've got something else going wrong.

2b) Use an alternative locking mechanism that allows us to clean up
the locks. sqlite supports creating a dotfile on the side whenever it
needs to access a file. It says this makes performance much worse
(repeatedly creating and deleting a file every time you access the
database, and you can't have any simultaneous readers), but perhaps
that is acceptable.

3) Implement some complex locking structure of our own on the side
like what Scott suggested. Realistically I'm not going to do this; I
estimate the fraction of our users to be affected by the above
problems to be somewhere around 0.01%, I have other things to worry
about, and we need less code, not more.

My current plan is:
- Along the lines of (1), add an UMA stat related to network file
systems, just to evaluate how much of a problem this really is for
users.
- Implement 2a under the diagnostics mode.

Markus Gutschke

unread,

Aug 19, 2010, 6:43:32 PM8/19/10

to Elliot Glaysher, ev...@chromium.org, David Pizzuto, chromium-dev

Evan, if you are going to write UMA code. Can you also make the code create a file in /var/tmp and see if it survives for more than 30 days? Traditionally, /var/tmp was a safe place to create long-lived local files, and we might offer to move a user's profile to /var/tmp, if they don't have any other non-NFS location.

But I am a little worried that distributions might decide to delete files in /var/tmp, if they haven't been touched in a while. Or even worse, on each reboot. This should only happen in /tmp. But you never know what crazy ideas the distributions come up with next.

Markus

On Thu, Aug 19, 2010 at 15:38, Elliot Glaysher <e...@google.com> wrote:

On Thu, Aug 19, 2010 at 3:29 PM, Evan Martin <ev...@chromium.org> wrote:
> 1) Sledgehammer: refuse to run on NFS. Aside from this bug, network
> file systems hurt performance in all sorts of unexpected ways.
> (See e.g. "https connections are very slow to establish (5-30
> seconds)" http://code.google.com/p/chromium/issues/detail?id=48585 )
> This would force some small fraction of our users to manually modify
> their profiles to be on local disk, which is pretty brutal.
> On the other hand, the sort of user who has an NFS homedir is more
> likely the sort of user who can understand how to do this.

Could we have an infobar yell at the user on every startup? "Hello,
you are using NFS and NFS is broken. Move your profile off, or your
profile will be corrupted periodically?"

-- Elliot

Lei Zhang

unread,

Aug 19, 2010, 6:45:07 PM8/19/10

to ev...@chromium.org, Markus Gutschke, David Pizzuto, chromium-dev

If you're going to do (1), then you may want to check out the Picasa
for Linux wrapper script. There, we avoided NFS for performance
reasons.

On Thu, Aug 19, 2010 at 3:29 PM, Evan Martin <ev...@chromium.org> wrote:

> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
>

Mohamed Mansour

unread,

Aug 19, 2010, 7:51:46 PM8/19/10

to the...@chromium.org, ev...@chromium.org, Markus Gutschke, David Pizzuto, chromium-dev

Is this related?

I can reproduce "the profile cannot be opened" if I install the Desktop app from Microsoft:

http://technet.microsoft.com/en-us/sysinternals/cc817881.aspx

Once installed, I always get the dialog when I run Chrome, and when uninstalled (or closed), I no longer get that.

If it is unrelated, I apologize in advance :)

-
Mohamed Mansour
m...@chromium.org

John Tamplin

unread,

Aug 19, 2010, 8:11:07 PM8/19/10

to ev...@chromium.org, Markus Gutschke, David Pizzuto, chromium-dev

On Thu, Aug 19, 2010 at 6:29 PM, Evan Martin <ev...@chromium.org> wrote:

1) Sledgehammer: refuse to run on NFS. Aside from this bug, network
file systems hurt performance in all sorts of unexpected ways.
(See e.g. "https connections are very slow to establish (5-30
seconds)" http://code.google.com/p/chromium/issues/detail?id=48585 )
This would force some small fraction of our users to manually modify
their profiles to be on local disk, which is pretty brutal.
On the other hand, the sort of user who has an NFS homedir is more
likely the sort of user who can understand how to do this.

Probably a non-starter but it is tempting. I hate that for some of
our users we just have slow+unreliable performance without them having
any way of knowing the reason (or that it is in fact a problem).
Here's a real unprompted quote from a coworker on this subject: "I
just moved Chrome's profile directory and oh my God is it zippy. I
never knew..."

A lot of enterprises are going to have their home directories on NFS. The fact that Firefox works properly with NFS-mounted homedirs and Chrome doesn't isn't going to win converts to Chrome.

Regarding the underlying problem, *every single time* my desktop at home (which mounts my home directory from my server via NFS) has been shutdown uncleanly (usually extended power outage, once a video driver bug), my Chrome profile is totally corrupted and I have to delete it and start fresh (not shared with any other Chrome instance, not leftover Chrome processes running, etc). I have taken to making known-good copies of it to avoid losing everything. I think in the 5 years of using Firefox, I have had a profile corrupted in that way exactly once.

If Oracle supports running an RDBMS over NFS on a NetApp, surely you can store profile information.

--
John A. Tamplin
Software Engineer (GWT), Google

Evan Martin

unread,

Aug 19, 2010, 8:32:42 PM8/19/10

to John Tamplin, Markus Gutschke, David Pizzuto, chromium-dev

On Thu, Aug 19, 2010 at 5:11 PM, John Tamplin <j...@google.com> wrote:
> A lot of enterprises are going to have their home directories on NFS. The
> fact that Firefox works properly with NFS-mounted homedirs and Chrome
> doesn't isn't going to win converts to Chrome.

As an aside, I don't think winning converts from Firefox is the goal.
But certainly we should emulate the good things Firefox does.

> Regarding the underlying problem, *every single time* my desktop at home
> (which mounts my home directory from my server via NFS) has been shutdown
> uncleanly (usually extended power outage, once a video driver bug), my
> Chrome profile is totally corrupted and I have to delete it and start fresh
> (not shared with any other Chrome instance, not leftover Chrome processes
> running, etc). I have taken to making known-good copies of it to avoid
> losing everything. I think in the 5 years of using Firefox, I have had a
> profile corrupted in that way exactly once.
> If Oracle supports running an RDBMS over NFS on a NetApp, surely you can
> store profile information.

You would think so! The frustrating thing about this is that sqlite
is actually very carefully constructed to be resilient against these
kinds of failures, so it is unlikely that it is actually corrupted.
So could you elaborate on what you mean by "totally corrupted"?

The next time this happens, try this recipe:
sudo apt-get install sqlite3
cd ~/.config/google-chrome/Default # or wherever it lives
find . -print0 | xargs -0 file | grep -i sqlite | cut -d: -f1 | while read f; do
echo -n "Checking $f... "
sqlite3 "$f" "pragma integrity_check"
done

That might point at the culprit.

The problem this thread is describing seems to just a bug in an NFS
implementation. At least on the machine I tested at Google, the
profile file was reported as locked, and no amount of fiddling would
fix it (including rebooting the client machine) -- it seemed like the
NFS server was hanging onto a lock, and as far as I understand the
POSIX locking API we can't tell it to let go. I suggested upthread
(proposal 2b) one way to work around this, but it seems kind of
pathetic to me.

Hopefully you're encountering some other sort of corruption, because
this type is nearly out of our hands. (I would love to be corrected if
I'm wrong.)

Reply all

Reply to author

Forward