Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NFS drive uids/gids completely broken- for a little while

56 views
Skip to first unread message

bri...@aracnet.com

unread,
May 27, 2015, 11:30:05 AM5/27/15
to
Hi,

This is a weird one.

Tried to use ssh and saw a "bad permissions" error on my .ssh/config file.

I do ls -l and i see uids/gids of 2^32-1 or a similar very large integer.

WTF ?!

So i go back to the server to make sure the ownership hasn't been borked some way and everything is fine.

I go back to my account on the client, do 'ls -l' again, and everything is as it should be.

WTF ?!


Brian

p.s.

1 my NFS drives are mounted at boot. i've noticed that i get errors about something not being ready, or auto mounting being a problem, but up until now i haven't seen any real problems.

2 i can't look at those boot messages i see in 1. why is it that a permanent method, installed by DEFAULT has never been implemented to look at boot messages ?


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150527082...@cedar.deldotd.com

Dan Ritter

unread,
May 27, 2015, 12:10:04 PM5/27/15
to
On Wed, May 27, 2015 at 08:25:16AM -0700, bri...@aracnet.com wrote:


>
> 2 i can't look at those boot messages i see in 1. why is it that a permanent method, installed by DEFAULT has never been implemented to look at boot messages ?
>

Have you looked at:

/var/log/dmesg
/var/log/kern.log
and
/var/log/syslog

?

They should all be enlightening. Sometime after that,
/var/log/auth.log and /var/log/daemon.log should be useful as
well.

As to your NFS problem, I suspect you are not mapping userids
across machines properly. What are your mount options?

-dsr-


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150527160...@randomstring.org

Bob Proulx

unread,
May 27, 2015, 3:40:04 PM5/27/15
to
bri...@aracnet.com wrote:
> This is a weird one.

That is a little weird that it was a transient glitch of a failure.

> Tried to use ssh and saw a "bad permissions" error on my .ssh/config file.
>
> I do ls -l and i see uids/gids of 2^32-1 or a similar very large integer.
>
> WTF ?!

Are you using --manage-gids?

root@fs:~# grep manage-gids /etc/default/nfs-kernel-server
RPCMOUNTDOPTS=--manage-gids

$ man rpc.mountd

-g or --manage-gids
Accept requests from the kernel to map user id numbers into
lists of group id numbers for use in access control. An NFS
request will normally (except when using Kerberos or other
cryptographic authentication) contains a user-id and a list of
group-ids. Due to a limitation in the NFS protocol, at most 16
groups ids can be listed. If you use the -g flag, then the list
of group ids received from the client will be replaced by a list
of group ids determined by an appropriate lookup on the
server. Note that the 'primary' group id is not affected so a
newgroup command on the client will still be effective. This
function requires a Linux Kernel with version at least 2.6.21.

That is normal for an NIS/yp environment. But it means that uid
lookups are done over the network. A transient network would return
-1 error codes for all of the numbers. It will make user ids appear
to be -1.

> So i go back to the server to make sure the ownership hasn't been
> borked some way and everything is fine.
>
> I go back to my account on the client, do 'ls -l' again, and
> everything is as it should be.
>
> WTF ?!

I am thinking it was a transient network failure coupled with the
above --manage-gids setting.

> 1 my NFS drives are mounted at boot. i've noticed that i get errors
> about something not being ready, or auto mounting being a problem,
> but up until now i haven't seen any real problems.
>
> 2 i can't look at those boot messages i see in 1. why is it that a
> permanent method, installed by DEFAULT has never been implemented to
> look at boot messages ?

I don't know either. I always install bootlogd since it was split
into a separate package.

Bob
signature.asc

deloptes

unread,
May 27, 2015, 5:40:05 PM5/27/15
to
bri...@aracnet.com wrote:

> Hi,
>
> This is a weird one.
>
> Tried to use ssh and saw a "bad permissions" error on my .ssh/config file.
>
> I do ls -l and i see uids/gids of 2^32-1 or a similar very large integer.
>
> WTF ?!
>
> So i go back to the server to make sure the ownership hasn't been borked
> some way and everything is fine.
>
> I go back to my account on the client, do 'ls -l' again, and everything is
> as it should be.
>
> WTF ?!
>
>
> Brian
>
> p.s.
>
> 1 my NFS drives are mounted at boot. i've noticed that i get errors about
> something not being ready, or auto mounting being a problem, but up until
> now i haven't seen any real problems.
>
> 2 i can't look at those boot messages i see in 1. why is it that a
> permanent method, installed by DEFAULT has never been implemented to look
> at boot messages ?

I had same issue after jessie and kernel upgrade. It turned out to be a
version mismatch. I think by default it would try to mount NFSv4 but I
assume you were using v3 as I did.
I solved it by enforcing use of v3 for now and plan to use v4 later.

The issue is wellknown old version uses 16bit and newer 32bit for u/gid.

it all means you'll have to take care of the versions and update initram
and/or config to boot/mount properly

regards


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/mk5dfl$ge4$1...@ger.gmane.org

bri...@aracnet.com

unread,
May 28, 2015, 1:30:04 AM5/28/15
to
On Wed, 27 May 2015 12:03:51 -0400
Dan Ritter <d...@randomstring.org> wrote:

> On Wed, May 27, 2015 at 08:25:16AM -0700, bri...@aracnet.com wrote:
>
>
> >
> > 2 i can't look at those boot messages i see in 1. why is it that a permanent method, installed by DEFAULT has never been implemented to look at boot messages ?
> >
>
> Have you looked at:
>
> /var/log/dmesg
> /var/log/kern.log
> and
> /var/log/syslog
>
> ?
>
> They should all be enlightening. Sometime after that,
> /var/log/auth.log and /var/log/daemon.log should be useful as
> well.
>
> As to your NFS problem, I suspect you are not mapping userids
> across machines properly. What are your mount options?
>

please keep in mind that this problem fixed itself. so 99% of the time it's coming up just fine. and when it did go bonkers, it fixed itself without me doing any kind of remounting or otherwise, which is highly strange.

mount options in fstab:

nfs4 auto,rw,hard,intr

export options:


/home/user ipaddr/24(rw,root_squash,insecure,anonuid=#,anongid=#,async,no_subtree_check)

i have the anonuid and anongid set to the correct userid and gid for the user's directory.


Brian


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150527222...@cedar.deldotd.com

bri...@aracnet.com

unread,
May 28, 2015, 1:30:04 AM5/28/15
to
aha. sounds like my problem. interesting that it's enabled by default.

i'm assuming that for my rinky-dink set-up with 5 users i don't need it ?



Brian


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150527222...@cedar.deldotd.com

Bob Proulx

unread,
May 28, 2015, 3:50:04 PM5/28/15
to
bri...@aracnet.com wrote:
> aha. sounds like my problem. interesting that it's enabled by default.
> i'm assuming that for my rinky-dink set-up with 5 users i don't need it ?

The number of users is not the determinating factor. It is the number
of groups for any particular user. There is an array size limit of
only 16 numbers in the underlying structure without it which limits
the number of groups possible per user to 16 or fewer. So likely you
don't need it. However that is the normal tested path these days so I
tend not to mess with it.

I would tend to be more concerned that something glitchy is happening
on your physical network connections that you saw something one moment
and then it went away on another moment. This may be an indicator of
something else happening. It is actually easier when things fail hard
because then you can get to root cause. Hard to do that when the
problem goes away.

Bob
signature.asc

bri...@aracnet.com

unread,
May 28, 2015, 11:40:04 PM5/28/15
to
On Thu, 28 May 2015 13:40:21 -0600
Bob Proulx <b...@proulx.com> wrote:

>
> I would tend to be more concerned that something glitchy is happening
> on your physical network connections that you saw something one moment
> and then it went away on another moment. This may be an indicator of
> something else happening. It is actually easier when things fail hard
> because then you can get to root cause. Hard to do that when the
> problem goes away.
>

that's the only time i've seen it happen.

at least i know to check for it now.

so i tend to do ls -l as soon as i get on.

so far, so good.

Brian


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150528203...@cedar.deldotd.com

bri...@aracnet.com

unread,
May 30, 2015, 12:40:04 PM5/30/15
to
On Thu, 28 May 2015 13:40:21 -0600
Bob Proulx <b...@proulx.com> wrote:


>
> I would tend to be more concerned that something glitchy is happening
> on your physical network connections that you saw something one moment
> and then it went away on another moment. This may be an indicator of
> something else happening. It is actually easier when things fail hard
> because then you can get to root cause. Hard to do that when the
> problem goes away.
>
> Bob

ok. caught it doing it again.

network connection is fine.

this is what dmesg tells me:


[ 42.426886] NFS: v4 server <name> does not accept raw uid/gids. Reenabling the idmapper.

What's _really_ weird is that my top-level user directory is fine, it's a sub-directory that's giving me uid/gid of 4294967294 4294967294

Brian


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/20150530093...@cedar.deldotd.com

Mike Kupfer

unread,
May 30, 2015, 1:50:05 PM5/30/15
to
<bri...@aracnet.com> wrote:

> [ 42.426886] NFS: v4 server <name> does not accept raw uid/gids. Reenabling the idmapper.

Here's some background information:

In NFSv3, uids and gids are represented as integers. In NFSv4, they are
strings. The original specification called for strings of the form
"id@domain", and typically a daemon on both the client and server would
map between that string and the integer IDs that are used natively on
that host.

But it proved to be hard to make this all work seamlessly out of the
box. The daemon wouldn't be able to do the mapping for whatever reason,
and people would frequently find their files owned by "nobody" (or some
moral equivalent).

So the most recent version of the spec (RFC7530) allows for the use of
integers encoded as strings. For example, ID 12345 would be sent as the
string "12345". Recent versions of Linux support this, but older
versions do not. I don't remember when support was added.

So, based on that log message, I'm guessing that this is what's
happening: the client attempts to use stringified integers, rather than
"id@domain". It gets a response back from the server that indicates
that the server does not support stringified integers, so the client
falls back to the old "id@domain" syntax.

If my guess is correct, possible remedies are

- change the mount to NFSv3
- upgrade the server

I'd also see if there are updates available for the client. It really
ought to do a better job of recovering in this case. I don't see a way
to configure the client to always use id@domain, but maybe I'm missing
something.

> What's _really_ weird is that my top-level user directory is fine,
> it's a sub-directory that's giving me uid/gid of 4294967294
> 4294967294

This is probably due to attribute caching on the client. Did you look
at the top-level directory first, or after the one that shows
4294967294?

mike


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: https://lists.debian.org/23611.14...@allegro.localdomain
0 new messages