Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Samba] Bulk smbcacls calls

330 views
Skip to first unread message

Peter Flood

unread,
Nov 28, 2013, 10:10:01 AM11/28/13
to
I want to get ACLs (output similar to that of smbcacls) for a *lot* of
files (potentially millions). I can only process about 10 files per
second when running the command (`smbcacls -U ...` via a Python
wrapper), I'm looking for a faster way.

Does anyone know any libraries or other commands that could help me?

Failing that, I assume that much of the time taken is spent on
authenticating the user/pw for each request. Would it be possible to
write something that keeps the connection so that multiple requests can
be made without reauthenticating (I'm not familiar with how
LDAP/AD/Samba works)? I have looked at the source of smbcacls but
nothing jumped out at me.

Many thanks
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba

David Disseldorp

unread,
Nov 28, 2013, 1:00:02 PM11/28/13
to
Hi Peter,

On Thu, 28 Nov 2013 15:00:18 +0000
Peter Flood <in...@whywouldwe.com> wrote:

> Does anyone know any libraries or other commands that could help me?

The cifs.ko kernel client utilities package ships a getcifsacl binary
which may be worth a try.

> Failing that, I assume that much of the time taken is spent on
> authenticating the user/pw for each request. Would it be possible to
> write something that keeps the connection so that multiple requests can
> be made without reauthenticating (I'm not familiar with how
> LDAP/AD/Samba works)? I have looked at the source of smbcacls but
> nothing jumped out at me.

Noel (cc'ed) just finished a bunch of changes adding inheritance
propagation to smbcacls:

http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=smbcalcs-inherit-v2

It doesn't do recursion on --get, but the new code could certainly be
leveraged to add this feature.

Cheers, David

Michael Brown

unread,
Nov 28, 2013, 2:40:02 PM11/28/13
to
On 13-11-28 10:00 AM, Peter Flood wrote:
> I want to get ACLs (output similar to that of smbcacls) for a *lot* of
> files (potentially millions). I can only process about 10 files per
> second when running the command (`smbcacls -U ...` via a Python
> wrapper), I'm looking for a faster way.
It's kind of ugly, but a quick workaround may be doing the calls in
parallel using a worker queue in python:

http://docs.python.org/2/library/queue.html

You'd be able to have an arbitrary number of outstanding requests at the
same time. It might do the trick for you.

M.

--
Michael Brown | `One of the main causes of the fall of
Systems Consultant | the Roman Empire was that, lacking zero,
Net Direct Inc. | they had no way to indicate successful
☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth

Andrew Bartlett

unread,
Nov 29, 2013, 2:20:02 AM11/29/13
to
On Thu, 2013-11-28 at 15:00 +0000, Peter Flood wrote:
> I want to get ACLs (output similar to that of smbcacls) for a *lot* of
> files (potentially millions). I can only process about 10 files per
> second when running the command (`smbcacls -U ...` via a Python
> wrapper), I'm looking for a faster way.
>
> Does anyone know any libraries or other commands that could help me?
>
> Failing that, I assume that much of the time taken is spent on
> authenticating the user/pw for each request. Would it be possible to
> write something that keeps the connection so that multiple requests can
> be made without reauthenticating (I'm not familiar with how
> LDAP/AD/Samba works)? I have looked at the source of smbcacls but
> nothing jumped out at me.

See the code in python/samba/netcmd/gpo.py that sets a remote ACL for
GPOs. This could be called from your own script, avoiding the
connection cost.

Otherwise, make sure you authenticate with kerberos, as this will be
much faster, even if you connect per file with smbcacls.

Andrew Bartlett

--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba

Peter Flood

unread,
Nov 29, 2013, 5:00:02 AM11/29/13
to
Thanks Andrew. I hadn't noticed the python code in the repo, I'm pretty
sure we'll be able to extract something that meets our needs.


On 29/11/2013 07:16, Andrew Bartlett wrote:
> On Thu, 2013-11-28 at 15:00 +0000, Peter Flood wrote:
>> I want to get ACLs (output similar to that of smbcacls) for a *lot* of
>> files (potentially millions). I can only process about 10 files per
>> second when running the command (`smbcacls -U ...` via a Python
>> wrapper), I'm looking for a faster way.
>>
>> Does anyone know any libraries or other commands that could help me?
>>
>> Failing that, I assume that much of the time taken is spent on
>> authenticating the user/pw for each request. Would it be possible to
>> write something that keeps the connection so that multiple requests can
>> be made without reauthenticating (I'm not familiar with how
>> LDAP/AD/Samba works)? I have looked at the source of smbcacls but
>> nothing jumped out at me.
> See the code in python/samba/netcmd/gpo.py that sets a remote ACL for
> GPOs. This could be called from your own script, avoiding the
> connection cost.
>
> Otherwise, make sure you authenticate with kerberos, as this will be
> much faster, even if you connect per file with smbcacls.
>
> Andrew Bartlett
>

--

Noel Power

unread,
Nov 29, 2013, 5:30:01 AM11/29/13
to
Hi,
On 28/11/13 17:56, David Disseldorp wrote:
> Hi Peter,
>
> On Thu, 28 Nov 2013 15:00:18 +0000
> Peter Flood <in...@whywouldwe.com> wrote:
[...]
>> Failing that, I assume that much of the time taken is spent on
>> authenticating the user/pw for each request. Would it be possible to
>> write something that keeps the connection so that multiple requests can
>> be made without reauthenticating (I'm not familiar with how
>> LDAP/AD/Samba works)? I have looked at the source of smbcacls but
>> nothing jumped out at me.
> Noel (cc'ed) just finished a bunch of changes adding inheritance
> propagation to smbcacls:
>
> http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=smbcalcs-inherit-v2
>
> It doesn't do recursion on --get, but the new code could certainly be
> leveraged to add this feature.
Yes unfortunately support for '--get' in a recursive fashion isn't
currently supported, it didn't quite fit with the inheritance
propagation feature, but... David is correct in that the new code could
definitely be leveraged to do that. However something like what you
require I believe needs another new cmdline switch e.g. something -like
maybe "-r|-recursive" [1]. Also I wonder how the output of smbcacls
should look for such an operation as the existing output doesn't
actually mention the file/dir that is being processed.

Noel

[1] in general I do believe a pure recursive option could be useful ( at
least for --get, --chown, --chgrp )

Noel Power

unread,
Dec 11, 2013, 4:50:02 AM12/11/13
to
Hi,

On 29/11/13 10:05, Noel Power wrote:
> [...]
>>> Failing that, I assume that much of the time taken is spent on
>>> authenticating the user/pw for each request. Would it be possible to
>>> write something that keeps the connection so that multiple requests can
>>> be made without reauthenticating (I'm not familiar with how
>>> LDAP/AD/Samba works)? I have looked at the source of smbcacls but
>>> nothing jumped out at me.
>> Noel (cc'ed) just finished a bunch of changes adding inheritance
>> propagation to smbcacls:
>>
>> http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=smbcalcs-inherit-v2
>>
>> It doesn't do recursion on --get, but the new code could certainly be
>> leveraged to add this feature.
> Yes unfortunately support for '--get' in a recursive fashion isn't
> currently supported, it didn't quite fit with the inheritance
> propagation feature, but... David is correct in that the new code could
> definitely be leveraged to do that. However something like what you
> require I believe needs another new cmdline switch e.g. something -like
> maybe "-r|-recursive" [1]. Also I wonder how the output of smbcacls
> should look for such an operation as the existing output doesn't
> actually mention the file/dir that is being processed.
>
> Noel
>
> [1] in general I do believe a pure recursive option could be useful ( at
> least for --get, --chown, --chgrp )
if you are up to building from source you could try my repo
http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=recursive-smbcacls
I've added there a '-r' switch.

With that version built from source you can use '-r' with the new
'--propagate-inheritance' switch e.g.
smbcalcs -r --propagate-inheritance --add|--modify|--set|--delete

you can additionally use '-r' with the following operations --get,
--chown & --chgrp

note: you can't use -r with ' --add|--modify|--set|--delete' alone, if
you want to use '-r' with (add/modify/delete...) you *must* additionally
specify '--propagate-inheritance'

I did some very very rough performance testing with a linux host running
kvm with a winserver 2012 guest

smbcalcs -r --get -Uusername%password //guestIP/share /testdir > /dev/null

returns in < 2 minutes ( testdir has 20,069 files in 2,842 directories )

regards,
Noel

Peter Flood

unread,
Dec 11, 2013, 5:50:01 AM12/11/13
to
Hi Noel

That sounds like a great addition to smbcacls.

We played around with the source of the of sbmcacls and found that a lot
of the time is spent converting the numeric user/group ids to their
human equivalents. eg if 1 file has 3 users and 5 groups there's 8
requests to resolve the numeric user/group ids (1 request per conversion
if I recall correctly), so we realised by repeatedly calling smbcacls
use we were effectively looking up the same groups multiple times, we
needed to cache the lookup results.

Then we found that we could get the acls with numeric output, similar
output to smbcacls but without the conversion to human form, with latest
version of pysmbc from https://git.fedorahosted.org/cgit/pysmbc.git/
(the latest commits added the functionality we wanted which wasn't in
the version from pypi). To get the human user/group representation we
use smbcacls and parse the output and store in a numeric -> human map so
we only make max 1 request per new user/group encountered (a bit hackish
but it works for us). It would be good to be able to make the same
lookup request that smbcacls makes to resolve a user/group id in python,
it would be a useful addition to pysmbc if the data is available from
libsmbclient. By doing it this way we've found that we can process
200-300 files per second in our setup (approx 13,000 files, not sure how
many directories).

We scan to get all the individual file objects into our database then
make 1 request per file to get the acls, using a recursive version of
smbcalcs and matching files in the output back to those in our db would
be awkward in our situation, especially if files are added or removed in
the period between the scan and recursive smbcacls call.

I welcome any comments regarding our approach.

I'll give your new version of smbc a go this afternoon if I get a chance.

Peter

Noel Power

unread,
Dec 11, 2013, 6:20:03 AM12/11/13
to
Hi Peter
On 11/12/13 10:34, Peter Flood wrote:
> Hi Noel
>
> That sounds like a great addition to smbcacls.
>
> We played around with the source of the of sbmcacls and found that a
> lot of the time is spent converting the numeric user/group ids to
> their human equivalents. eg if 1 file has 3 users and 5 groups there's
> 8 requests to resolve the numeric user/group ids (1 request per
> conversion if I recall correctly), so we realised by repeatedly
> calling smbcacls use we were effectively looking up the same groups
> multiple times, we needed to cache the lookup results.
very interesting, I didn't yet find the need to do any in-depth
performance analysis, caching those values in the context of recursive
operations seems indeed to be a good idea.
>
> Then we found that we could get the acls with numeric output, similar
> output to smbcacls but without the conversion to human form, with
> latest version of pysmbc from
> https://git.fedorahosted.org/cgit/pysmbc.git/ (the latest commits
> added the functionality we wanted which wasn't in the version from
> pypi). To get the human user/group representation we use smbcacls and
> parse the output and store in a numeric -> human map so we only make
> max 1 request per new user/group encountered (a bit hackish but it
> works for us). It would be good to be able to make the same lookup
> request that smbcacls makes to resolve a user/group id in python,
yup, like mentioned above, I think smbcalcs would benifit from caching
that info
> it would be a useful addition to pysmbc if the data is available from
> libsmbclient.
libsmbclient is somewhat outside my experience sofar (I am new to samba,
smbcacls is the only thing I have looked at in any depth).
> By doing it this way we've found that we can process 200-300 files
> per second in our setup (approx 13,000 files, not sure how many
> directories).
so, if I run 'smbcacls --get -r --numeric' on the same test directory
(20,069 files in 2,842 directories) it finishes in ~30 seconds
>
> We scan to get all the individual file objects into our database then
> make 1 request per file to get the acls, using a recursive version of
> smbcalcs and matching files in the output back to those in our db
> would be awkward in our situation, especially if files are added or
> removed in the period between the scan and recursive smbcacls call.
not entirely sure what you mean about the awkwardness of "recursive
version of smbcalcs and matching files in the output back to those in
our db", surely smbcalcs ( a recursive version ) should mean you don't
need to do this 2 step process, you should just get the info you need.
Regarding files being added and removed, isn't that going to be a
problem ( regarding stale data ) no matter what approach you take (
unless you can somehow lock the directory being precossed for the
duration of the operation(s) )?
>
> I welcome any comments regarding our approach.
>
> I'll give your new version of smbc a go this afternoon if I get a chance.
please do!

thanks,

Peter Flood

unread,
Dec 11, 2013, 7:10:02 AM12/11/13
to
Hi Noel

Is there any documentation regarding the protocols used by smbcacls to
get the raw acls and lookup the user/group ids? Eg how to make the raw
requests (I'm not great with C)?
That's fast, I'd like to be able to do it at that speed.
>> We scan to get all the individual file objects into our database then
>> make 1 request per file to get the acls, using a recursive version of
>> smbcalcs and matching files in the output back to those in our db
>> would be awkward in our situation, especially if files are added or
>> removed in the period between the scan and recursive smbcacls call.
> not entirely sure what you mean about the awkwardness of "recursive
> version of smbcalcs and matching files in the output back to those in
> our db", surely smbcalcs ( a recursive version ) should mean you don't
> need to do this 2 step process, you should just get the info you need.
> Regarding files being added and removed, isn't that going to be a
> problem ( regarding stale data ) no matter what approach you take (
> unless you can somehow lock the directory being precossed for the
> duration of the operation(s) )?
Yes, replacing our current scan with a recursive smbcalcs call warrants
further investigation.
>> I welcome any comments regarding our approach.
>>
>> I'll give your new version of smbc a go this afternoon if I get a chance.
> please do!
>
> thanks,
>
> Noel

Thanks
Peter

Noel Power

unread,
Dec 11, 2013, 9:30:01 AM12/11/13
to
Hi Peter
On 11/12/13 12:01, Peter Flood wrote:
> Hi Noel
>
> Is there any documentation regarding the protocols used by smbcacls to
> get the raw acls and lookup the user/group ids? Eg how to make the raw
> requests (I'm not great with C)?
I guess there is, but... as yet I haven't had the need to dig any deeper
than the smbcacls source code, sorry abou that
>
> On 11/12/2013 11:15, Noel Power wrote:
>> Hi Peter
>> On 11/12/13 10:34, Peter Flood wrote:
[...]
>>> By doing it this way we've found that we can process 200-300 files
>>> per second in our setup (approx 13,000 files, not sure how many
>>> directories).
>> so, if I run 'smbcacls --get -r --numeric' on the same test directory
>> (20,069 files in 2,842 directories) it finishes in ~30 seconds
> That's fast, I'd like to be able to do it at that speed.
I updated smbcacls to cache the sid translation, 'smbcacls -r --get'
finishes processing the test directory also in 30 sec.

http://cgit.freedesktop.org/~noelp/noelp-samba/log/?h=recursive-smbcacls
contains that change now too

regards,
Noel

Noel Power

unread,
Dec 11, 2013, 11:10:01 AM12/11/13
to
Hi Peter,
On 11/12/13 15:38, Peter Flood wrote:
> Hi Noel
[...]
> We just tried this branch. In our setup we get ~250 files per second
> with smbcacls -r. A few weeks ago we got about 10 files per second by
> making repeated calls to smbcalcs via subprocess in python. In a test
> just now we got ~225 files per second with our scan then
> pysmbc/smbcacls approach (described earlier), however we also stat
> every file and write some data to db (maybe 3 bulk writes per 1,000
> files) so it's not a direct comparison.
>
> One interesting thing we did notice was that we got ~250 additional
> 'files' with the recursive smbcalcs, I assume those are the
> directories, can you confirm that directories are also output with -r?
yes directories are also output
> Is there a way to filter those out?
not directly, presumably you pipe the output of smbcalcs to a file, you
should with a script be able to postprocess the complete output of
'smbcalcs -r--get' and scrape/filter whatever information you need out
of that.

thanks,

Jean Carlos Coelho

unread,
Dec 11, 2013, 1:10:02 PM12/11/13
to
Hi,

Is there some way to avoid only new installs of any kind of software
through PDC domain? My problem is, "Domain Users" Group works fine, but
installations of ³DropBox² (eg.) are allowed to the user at desktop with
pdc configurations (only for the user options), the CEO of the company
denied the policy of "softwares install" for the employees, I know that
dpedit.msc and disable windows installer works.. But also deny the
execution of netlogon.cmd at the startup (login). What¹s the best
practices for this event? :)

Thank You!
0 new messages