puppet-dashboard 2.0.0 (open source) and postgresq 8.4l tuning

138 views
Skip to first unread message

Pete Hartman

unread,
Mar 17, 2014, 4:29:26 PM3/17/14
to puppet...@googlegroups.com
I deployed the open source puppet-dashboard 2.0.0 this past weekend for our production environment.  I did a fair amount of testing in the lab to ensure I had the deployment down, and I deployed as a passenger service knowing that we have a large environment and that webrick wasn't likely to cut it.  Overall, it appears to be working and behaving reasonably--I get the summary run status graph, etc, the rest of the UI.  Load average on the box is high-ish but nothing unreasonable, and I certainly appear to have headroom in memory and CPU.

However, when I click the "export nodes as CSV" link, it runs forever (Hasn't stopped yet). 

I looked into what the database was doing and it appears to be looping over some unknown number of report_ids, doing

    7172 | dashboard | SELECT COUNT(*) FROM "resource_statuses"  WHERE "resource_statuses"."report_id" = 39467 AND "resource_statuses"."failed" = 'f' AND (
IN ( | 00:00:15.575955
                     :           SELECT resource_statuses.id FROM resource_statuses

                     :             INNER JOIN resource_events ON resource_statuses.id = resource_events.resource_status_id

                     :             WHERE resource_events.status = 'noop'

                     :         )

                     : )



I ran the inner join by hand and it takes roughly 2 - 3 minutes each time.  The overall query appears to be running 8 minutes per report ID.

I've done a few things to tweak postgresql before this--it could have been running longer earlier when I first noticed the problem.

I increased checkpoint segments to 32 from the default of 3, the checkpoint_completion_target to 0.9 from the default of 0.5, and to be able to observe what's going on I set stats_command_string to on.

Some other details: we have 3400 nodes (dashboard is only seeing 3290 or so, which is part of why I want this CSV report to determine why it's a smaller number).  This postgresql instance is also the instance supporting puppetdb, though obviously a separate database.  The resource statuses table has 47 million rows right now, and the inner join returns 4.3 million.

I'm curious if anyone else is running this version on postgresql with a large environment and if there are places I ought to be looking to tune this so it will run faster, or if I need to be doing something to shrink those tables without losing information, etc.

Thanks

Pete

Pete Hartman

unread,
Mar 17, 2014, 5:31:09 PM3/17/14
to puppet...@googlegroups.com
I also increased bgwriter_lru_maxpages to 500 from the default 100.

Gav

unread,
Dec 19, 2014, 3:48:14 PM12/19/14
to puppet...@googlegroups.com
Pete, what version of Passenger are you running? I have deployed puppet-dashboard 2.0.0 this week with Passenger 4.0.56 and Ruby 1.9.3, but Passenger is just eating the memory. 

------ Passenger processes -------
PID    VMSize     Private    Name
----------------------------------
5173   6525.1 MB  3553.0 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
5662   5352.7 MB  4900.8 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
5682   5736.8 MB  5307.1 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
8486   6525.2 MB  4469.5 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
10935  6525.0 MB  3282.3 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
11885  6380.3 MB  3905.9 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
20886  209.8 MB   0.1 MB     PassengerWatchdog
20889  2554.9 MB  7.2 MB     PassengerHelperAgent
20896  208.9 MB   0.0 MB     PassengerLoggingAgent
21245  2602.8 MB  2268.6 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
22912  500.7 MB   115.4 MB   Passenger RackApp: /local/puppet/etc/rack
24873  6505.1 MB  3592.6 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
26226  1944.3 MB  1616.6 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
29012  6525.0 MB  3460.4 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
30564  4072.7 MB  3675.4 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
31060  3526.8 MB  3181.6 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
31733  6505.5 MB  5761.4 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
31740  6525.4 MB  5812.2 MB  Passenger RackApp: /local/puppet/dashboard/dashboard
### Processes: 18
### Total private dirty RSS: 54910.21 MB

Any help would be appreciated.

Cheers,
Gavin

Pete Hartman

unread,
Dec 19, 2014, 4:30:44 PM12/19/14
to puppet...@googlegroups.com

I'm no longer at that position, haven't seen it in 8 months....

--
You received this message because you are subscribed to a topic in the Google Groups "Puppet Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/puppet-users/Cq6h0bl_wvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/9facbf64-4dab-4566-b967-1d36f1235e2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ramin K

unread,
Dec 19, 2014, 6:28:21 PM12/19/14
to puppet...@googlegroups.com
I would trim down the number of dashboard processes you need to a max
of 2-4, a min of 1, and recycle every 10k requests. You can set all of
that in the vhost IIRC. The Passenger docs are pretty good in the that
regard.

Ramin
> <http://resource_statuses.id> FROM resource_statuses
>
> : INNER JOIN resource_events ON
> resource_statuses.id <http://resource_statuses.id> =
> resource_events.resource_status_id
>
> : WHERE resource_events.status =
> 'noop'
>
> : )
>
> : )
>
>
>
> I ran the inner join by hand and it takes roughly 2 - 3 minutes each
> time. The overall query appears to be running 8 minutes per report ID.
>
> I've done a few things to tweak postgresql before this--it could
> have been running longer earlier when I first noticed the problem.
>
> I increased checkpoint segments to 32 from the default of 3, the
> checkpoint_completion_target to 0.9 from the default of 0.5, and to
> be able to observe what's going on I set stats_command_string to on.
>
> Some other details: we have 3400 nodes (dashboard is only seeing
> 3290 or so, which is part of why I want this CSV report to determine
> why it's a smaller number). This postgresql instance is also the
> instance supporting puppetdb, though obviously a separate database.
> The resource statuses table has 47 million rows right now, and the
> inner join returns 4.3 million.
>
> I'm curious if anyone else is running this version on postgresql
> with a large environment and if there are places I ought to be
> looking to tune this so it will run faster, or if I need to be doing
> something to shrink those tables without losing information, etc.
>
> Thanks
>
> Pete
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-users...@googlegroups.com
> <mailto:puppet-users...@googlegroups.com>.
> <https://groups.google.com/d/msgid/puppet-users/9facbf64-4dab-4566-b967-1d36f1235e2f%40googlegroups.com?utm_medium=email&utm_source=footer>.

Gav

unread,
Jan 2, 2015, 6:39:47 AM1/2/15
to puppet...@googlegroups.com, ramin...@badapple.net
Thanks chaps. It turns out that an internal process was DOS'ing the dashboard with wget's for nodes.csv. 
Reply all
Reply to author
Forward
0 new messages