Issue with large directory content

116 views
Skip to first unread message

Bernd Adamowicz

unread,
Sep 11, 2012, 10:16:22 AM9/11/12
to puppet...@googlegroups.com
Hi all,

I got this directory configuration:


file { "${codebase_ng::repository_mount}/${sonatype_work_dir}":
ensure => directory,
owner => $nexus_user,
group => $nexus_group,
mode => 0755,
recurse => false,
backup => false,
}

Today I added some 100GB of artifacts to a subdirectory of
"${codebase_ng::repository_mount}/${sonatype_work_dir}". Now the result
is that the Puppet seems to run "forever". If I uncomment this code,
Puppet finishes in 15 seconds. So I presume Puppet is doing some
recursive scanning of this directory. Could this be true? Is there a
know issue with large content of directories?

Thanks in advance!
Bernd

Bernd Adamowicz

unread,
Sep 12, 2012, 10:08:13 AM9/12/12
to puppet...@googlegroups.com
No ideas at all?

> -----Ursprüngliche Nachricht-----
> Von: Bernd Adamowicz
> Gesendet: Dienstag, 11. September 2012 16:16
> An: puppet...@googlegroups.com
> Betreff: Issue with large directory content

Christopher Wood

unread,
Sep 12, 2012, 10:12:10 AM9/12/12
to puppet...@googlegroups.com
I don't have enough information to say. You might want to run the master and agent in debug mode to get more output, though.

puppet agent --debug --verbose --no-daemonize

Also, 100 GB? Any particular reason why you're not installing this using a content distribution system or a large number of RPMs?
> --
> You received this message because you are subscribed to the Google Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
>
>

Peter Brown

unread,
Sep 12, 2012, 7:24:03 PM9/12/12
to puppet...@googlegroups.com
On 13 September 2012 00:12, Christopher Wood <christop...@pobox.com> wrote:
> I don't have enough information to say. You might want to run the master and agent in debug mode to get more output, though.
>
> puppet agent --debug --verbose --no-daemonize
>
> Also, 100 GB? Any particular reason why you're not installing this using a content distribution system or a large number of RPMs?

Recursing through A 100Gb directory will definitely slow down your puppet run.
If the contents of the directory are reasonably static an RPM would be
the best idea.
If it's not static a git or svn repo would be a better idea.
If you are tricky you can manage the checkouts of git or svn with
puppet as well.
I wrote a few tricky resources for this a while ago and they are
infinitely handy.

Bernd Adamowicz

unread,
Sep 13, 2012, 4:45:45 AM9/13/12
to puppet...@googlegroups.com
Thanks for your answers so far.

But beware that the huge artifacts are *not* managed by Puppet (see recurse => false). Actually it's a Maven repository filled by Nexus. Only the top directory is managed by Puppet to have it in place and have correct access rights. This worked well until I initially filled the repository with the artifacts manually. That slowed down Puppet. Seems to me as Puppet will do some recursive scanning, but that's just an assumption, since Puppet is running with almost 100% CPU load.

I turned debug on in Puppet but will not see anything even after a few minutes. Presumably I would see something if I let Puppet run just long enough. However, it's a strange behaviour I've never experienced. And I think my configuration is OK.

Bernd

Bernd Adamowicz

unread,
Sep 13, 2012, 8:44:55 AM9/13/12
to puppet...@googlegroups.com

This keeps being weird. Simply thought to wait until Puppet finishes, but had to quit after one and a half hour. Also tried ‘ensure  => present’ instead of ‘ensure  => directory‘ with no success. No log output at all. Still investigating. But any ideas still highly appreciated!

 

Bernd

--

David Schmitt

unread,
Sep 14, 2012, 3:07:52 AM9/14/12
to puppet...@googlegroups.com
Use strace to take a look at what's really happening. That should make
it much easier to pinpoint the culprit:


strace -e file -f puppet agent --test



Best Regards, David

On 13.09.2012 14:44, Bernd Adamowicz wrote:
> This keeps being weird. Simply thought to wait until Puppet finishes,
> but had to quit after one and a half hour. Also tried �ensure =>
> present� instead of �ensure => directory� with no success. No log
> output at all. Still investigating. But any ideas still highly appreciated!
>
> Bernd
>
> *Von:*puppet...@googlegroups.com
> [mailto:puppet...@googlegroups.com] *Im Auftrag von *Bernd Adamowicz
> *Gesendet:* Donnerstag, 13. September 2012 10:46
> *An:* puppet...@googlegroups.com
> *Betreff:* Re: [Puppet Users] AW: Issue with large directory content
>
> Thanks for your answers so far.
>
> But beware that the huge artifacts are *not* managed by Puppet (see
> recurse => false). Actually it's a Maven repository filled by Nexus.
> Only the top directory is managed by Puppet to have it in place and have
> correct access rights. This worked well until I initially filled the
> repository with the artifacts manually. That slowed down Puppet. Seems
> to me as Puppet will do some recursive scanning, but that's just an
> assumption, since Puppet is running with almost 100% CPU load.
>
> I turned debug on in Puppet but will not see anything even after a few
> minutes. Presumably I would see something if I let Puppet run just long
> enough. However, it's a strange behaviour I've never experienced. And I
> think my configuration is OK.
>
> Bernd
>
> On 09/13/2012 01:24 AM, Peter Brown wrote:
>
> On 13 September 2012 00:12, Christopher Wood
> <christop...@pobox.com> <mailto:christop...@pobox.com> wrote:
> > I don't have enough information to say. You might want to run the master and agent in debug mode to get more output, though.
> >
> > puppet agent --debug --verbose --no-daemonize
> >
> > Also, 100 GB? Any particular reason why you're not installing this using a content distribution system or a large number of RPMs?
>
> Recursing through A 100Gb directory will definitely slow down your
> puppet run.
> If the contents of the directory are reasonably static an RPM would be
> the best idea.
> If it's not static a git or svn repo would be a better idea.
> If you are tricky you can manage the checkouts of git or svn with
> puppet as well.
> I wrote a few tricky resources for this a while ago and they are
> infinitely handy.
>
> >
> > On Wed, Sep 12, 2012 at 04:08:13PM +0200, Bernd Adamowicz wrote:
> >> No ideas at all?
> >>
> >> > -----Urspr�ngliche Nachricht-----
> >> > Von: Bernd Adamowicz
> >> > Gesendet: Dienstag, 11. September 2012 16:16
> >> > An:puppet...@googlegroups.com <mailto:puppet...@googlegroups.com>
> >> > Betreff: Issue with large directory content
> >> >
> >> > Hi all,
> >> >
> >> > I got this directory configuration:
> >> >
> >> >
> >> > file {
> >> > "${codebase_ng::repository_mount}/${sonatype_work_dir}":
> >> > ensure => directory,
> >> > owner => $nexus_user,
> >> > group => $nexus_group,
> >> > mode => 0755,
> >> > recurse => false,
> >> > backup => false,
> >> > }
> >> >
> >> > Today I added some 100GB of artifacts to a subdirectory of
> >> > "${codebase_ng::repository_mount}/${sonatype_work_dir}". Now the result
> >> > is that the Puppet seems to run "forever". If I uncomment this code,
> >> > Puppet finishes in 15 seconds. So I presume Puppet is doing some
> >> > recursive scanning of this directory. Could this be true? Is there a
> >> > know issue with large content of directories?
> >> >
> >> > Thanks in advance!
> >> > Bernd
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com
> <mailto:puppet...@googlegroups.com>.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com
> <mailto:puppet-users...@googlegroups.com>.

Andrew Stangl

unread,
Sep 14, 2012, 3:42:22 AM9/14/12
to puppet...@googlegroups.com
I had a similar issue some time back, and it was pointed out by the very helpful people in this group that puppet does an MD5 sum (or other hash check) on each file in the directory structure (probably something the strace mentioned would show you)

This post gives a suggested solution to managing a large directory structure: 

So, the best approach is to exec find(1) or something similar, with an unless to limit the run if it's not going to need to make any changes.

The only thing puzzling me, is that you're specifying recurse => false... so I'd expect there may be something else going on.
Is the directory on the same files-system as the OS? or is it a mounted (NFS?) volume? 
Your code specify's "mountpoint" .. so this may be the source of your issue, not the recurse/managment.
If this is the case, you would possibly be better off managing the mountpoint with the mount directive, in which case, you 
may be experiencing something similar to what we dealt with here: 
 
Hope this helps,
Andrew



On Thursday, September 13, 2012 1:46:14 PM UTC+1, badamowicz wrote:

This keeps being weird. Simply thought to wait until Puppet finishes, but had to quit after one and a half hour. Also tried ‘ensure  => present’ instead of ‘ensure  => directory‘ with no success. No log output at all. Still investigating. But any ideas still highly appreciated!

 

Bernd

 

Von: puppet...@googlegroups.com [mailto:puppet...@googlegroups.com] Im Auftrag von Bernd Adamowicz
Gesendet: Donnerstag, 13. September 2012 10:46
An: puppet...@googlegroups.com
Betreff: Re: [Puppet Users] AW: Issue with large directory content

 

Thanks for your answers so far.

But beware that the huge artifacts are *not* managed by Puppet (see recurse => false). Actually it's a Maven repository filled by Nexus. Only the top directory is managed by Puppet to have it in place and have correct access rights. This worked well until I initially filled the repository with the artifacts manually. That slowed down Puppet. Seems to me as Puppet will do some recursive scanning, but that's just an assumption, since Puppet is running with almost 100% CPU load.

I turned debug on in Puppet but will not see anything even after a few minutes. Presumably I would see something if I let Puppet run just long enough. However, it's a strange behaviour I've never experienced. And I think my configuration is OK.

Bernd

On 09/13/2012 01:24 AM, Peter Brown wrote:

Bernd Adamowicz

unread,
Sep 14, 2012, 7:14:28 AM9/14/12
to puppet...@googlegroups.com
First, thanks for all the replies! And second, I found the solution. Let me explain:

I tried David's suggestion first (strace) and it really turned out that Puppet is accessing/opening every single file below '/repository/sonatype-work/'. Some excerpts from strace:

32124
lstat("/repository/sonatype-work/nexus/indexer/apache-snapshots-ctx/segments.gen",
{st_mode=S_IFREG|0644, st_size=20, ...}) = 0
32124
lstat("/repository/sonatype-work/nexus/indexer/apache-snapshots-ctx/_0_1.del",
{st_mode=S_IFREG|0750, st_size=9, ...} ) = 0
32124
open("/repository/sonatype-work/nexus/storage/ibiblio-maven2/org/seleniumhq/selenium/selenium-support/2.25.0/selenium-support-2.25.0.jar",
O_RDONLY) = 4
32124
open("/repository/sonatype-work/nexus/storage/ibiblio-maven2/joda-time/joda-time-hibernate/1.2/joda-time-hibernate-1.2.pom",
O_RDONLY) = 4

The other questions from all of you and finally Den's question, which
was: "Are you trying to set any permissions inside that directory elsewhere in the manifest?" made me rethink everything and pointed me to the right place. A few lines down from where I thought the error was I used to have this:

file { [
"${codebase_ng::repository_mount}/${sonatype_work_dir}/nexus",
"${codebase_ng::repository_mount}/${sonatype_work_dir}/nexus/conf"]:
require =>
File["${codebase_ng::repository_mount}/${sonatype_work_dir}"],
ensure => directory,
owner => $nexus_user_id,
group => $nexus_group_id,
mode => 0750,
source => "puppet:///modules/codebase_ng/nexus/conf",
recurse => true,
purge => false,
}

This file resource was just about having configuration files in place below '/repository/sonatype-work/nexus/conf'. But obviously the first entry in the file array which resolves to '/repository/sonatype-work/nexus' was the trigger for Puppet to start recursive scanning everything.

So, the misconfiguration was on my side (nice pitfall), but I wonder if this is expected behaviour. Shouldn't recursion only be done on the last entry of the file array? I'm not sure. Maybe the PuppetLabs guys might think about it.

However, thanks again to all who helped and pointed me to the solution!

Bernd




On 09/14/2012 09:07 AM, David Schmitt wrote:
> Re: AW: [Puppet Users] AW: Issue with large directory content
>
> Use strace to take a look at what's really happening. That should make
> it much easier to pinpoint the culprit:
>
>
> strace -e file -f puppet agent --test
>
>
>
> Best Regards, David
>

jcbollinger

unread,
Sep 14, 2012, 9:09:40 AM9/14/12
to puppet...@googlegroups.com


On Friday, September 14, 2012 6:14:37 AM UTC-5, badamowicz wrote:
The other questions from all of you and finally Den's question, which
was: "Are you trying to set any permissions inside that directory elsewhere in the manifest?" made me rethink everything and pointed me to the right place. A few lines down from where I thought the error was I used to have this:

         file { [
"${codebase_ng::repository_mount}/${sonatype_work_dir}/nexus",
"${codebase_ng::repository_mount}/${sonatype_work_dir}/nexus/conf"]:
             require =>
File["${codebase_ng::repository_mount}/${sonatype_work_dir}"],
             ensure  => directory,
             owner   => $nexus_user_id,
             group   => $nexus_group_id,
             mode    => 0750,
             source  => "puppet:///modules/codebase_ng/nexus/conf",
             recurse => true,
             purge   => false,
         }

This file resource was just about having configuration files in place below '/repository/sonatype-work/nexus/conf'. But obviously the first entry in the file array which resolves to '/repository/sonatype-work/nexus' was the trigger for Puppet to start recursive scanning everything.

So, the misconfiguration was on my side (nice pitfall), but I wonder if this is expected behaviour. Shouldn't recursion only be done on the last entry of the file array?


No.  Why would you suppose that?  When you write a resource declaration of the form

someresource { [ 'title1', 'title2']:
  param1 => value1,
  param2 => value2
}

it is shorthand for separate resource declarations, one for each title given, all having the specified parameters:

someresource { 'title1':
  param1 => value1,
  param2 => value2
}

someresource { 'title2':
  param1 => value1,
  param2 => value2
}

The use of an array to specify multiple titles is perhaps a bit quirky in itself (though useful!), but I don't see why anyone would suppose that resources specified that way would be assigned different parameters from each other.

There is nothing specific to the File resource type here, but even if there were, why would you expect Puppet to suppose that you only wanted recursion on one of the specified resources?  I don't see it.


John



Bernd Adamowicz

unread,
Sep 17, 2012, 5:11:50 AM9/17/12
to puppet...@googlegroups.com
Yes, you're right. My idea was to have the 'source=>' parameter only
applied to the last entry in the array. This was simply an error in
reasoning.

Thanks
Bernd
Reply all
Reply to author
Forward
0 new messages