r1994 - Instead of a request that cannot get a lock falling through to 'Sending VNFS from the data store', return a 503. wwgetvnfs retries correctly until the vnfs is cached. This eliminates huge memory usage both on the wwmaster and backing mysqld as multiple nodes try to grab the same vnfs

14 views
Skip to first unread message

com...@warewulf.lbl.gov

unread,
Jun 19, 2016, 9:03:03 PM6/19/16
to warewul...@lbl.gov
Author: bsallen
Date: 2016-06-19 18:03:02 -0700 (Sun, 19 Jun 2016)
New Revision: 1994

Modified:
trunk/provision/cgi-bin/vnfs.pl
Log:
Instead of a request that cannot get a lock falling through to 'Sending VNFS from the data store', return a 503. wwgetvnfs retries correctly until the vnfs is cached. This eliminates huge memory usage both on the wwmaster and backing mysqld as multiple nodes try to grab the same vnfs

Modified: trunk/provision/cgi-bin/vnfs.pl
===================================================================
--- trunk/provision/cgi-bin/vnfs.pl 2016-06-06 18:38:00 UTC (rev 1993)
+++ trunk/provision/cgi-bin/vnfs.pl 2016-06-20 01:03:02 UTC (rev 1994)
@@ -71,6 +71,7 @@
my $vnfs_checksum = $obj->checksum();
my ($vnfs_nocache) = $obj->get("nocache");
my $use_cache;
+ my $cache_in_progress;

#&nprint("Sending VNFS '$vnfs_name' to node '$node_name'\n");
$q->print("Content-Type: application/octet-stream; name=\"vnfs.img\"\r\n");
@@ -93,18 +94,16 @@
if (! -d "$vnfs_cachedir/$vnfs_name") {
mkpath("$vnfs_cachedir/$vnfs_name");
}
-
+
my $lock_file = "$vnfs_cachedir/$vnfs_name/warewulf.cache.lock";
my $lock_fh;
-
+
if (! open($lock_fh, '>', $lock_file) || ! &lock($lock_fh)) {
+ $cache_in_progress = 1;
&eprint("Can't open VNFS cache. Locked by another request.\n");
- $q->print("Content-Type: application/octet-stream\r\n");
- $q->print("Status: 500\r\n");
- $q->print("\r\n");
} else {
&dprint("VNFS cache lock obtained.\n");
-
+
open($cache_fh, "> $vnfs_cachedir/$vnfs_name/image.$vnfs_checksum.$rand");
my $binstore = $db->binstore($obj->get("_id"));

@@ -147,12 +146,17 @@
$q->print("\r\n");
}

- } else {
+ } elsif(! $cache_in_progress) {
&dprint("Sending VNFS from the data store\n");
my $binstore = $db->binstore($obj->get("_id"));
while(my $buffer = $binstore->get_chunk()) {
$q->print($buffer);
}
+ } else {
+ &eprint("VNFS is being cached via a different request, try again.\n");
+ $q->print("Content-Type: application/octet-stream\r\n");
+ $q->print("Status: 503\r\n");
+ $q->print("\r\n");
}

} else {
@@ -187,4 +191,3 @@
}

# vim: filetype=perl:syntax=perl:expandtab:ts=4:sw=4:
-

Allen, Benjamin S.

unread,
Jun 19, 2016, 9:13:04 PM6/19/16
to warewul...@lbl.gov
I was getting OOMs on both the wwmaster's httpd and mysqld with as little as 10 nodes grabbing the same VNFS from the DB. Both the wwmaster and mysqld servers have 16GB of RAM. Note this is a RHEL7 master. I didn't used to see this problem on RHEL6, but thats likely due to below paragraph making the occurrence of cache misses more frequent on RHEL7. Also there's a pretty nice speed when booting a number of nodes when letting the first request cache the VNFS and the other requests wait, compared to all nodes pulling from the DB.

A side note for those looking at where RHEL7's httpd caches VNFS, it appears some systemd magic is setup to hide /var/tmp proper from httpd. As the VNFS get cached here: /var/tmp/systemd-private-89d1a1a1fbae4d76943bdc820c5eeeaa-httpd.service-ejWSOc/tmp/warewulf_cache/rhel7/image.90451abbaae873b465a01759cf271232. On every restart of httpd this systemd-private directory is removed. As such on every restart of httpd you loose your cached copies. I'm sure there's a way to turn this off, but I haven't looked into it.

Any chance we can squeeze this one into 3.7?

Thanks,

Ben
> --
> You received this message because you are subscribed to the Google Groups "Warewulf-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to warewulf-deve...@lbl.gov.
>

Gregory M. Kurtzer

unread,
Jun 19, 2016, 10:30:13 PM6/19/16
to warewul...@lbl.gov
Fine with me about 3.7 EXCEPT,..... Release tarball a have already been generated. But with that said there has been no release announcement yet .... And this should be retested to see if there is any adverse affects to it.

So if we want to do this, I recommend removing the release tarball a and SVN tags ASAP and then retesting.



Sent from my iPhone

Allen, Benjamin S.

unread,
Jun 20, 2016, 12:43:37 AM6/20/16
to warewul...@lbl.gov
We could also do a 3.7.1 release, but I'd expect folks will run into this issue with RHEL7 pretty quickly. A good way to test:

systemctl restart httpd

for i in {1..100}; do wget --quiet -O - "http://localhost/WW/vnfs?hwaddr=08:00:27:18:b3:e3" > /dev/null & done

or

for i in {1..100}; do sh -c 'curl -s -L --retry 10 "http://localhost/WW/vnfs?hwaddr=08:00:27:18:b3:e3" > /dev/null; echo "Exit: $?"' & done

Change the hwaddr to a configured node. Don't do this on a production machine or against a production database.

I've come up with a faster, even lower memory, and cleaner vnfs.pl than r1994. Instead of serving the vnfs itself, it redirects to the static cached file served via Apache. With this change I've been able to test on a VM with 2GB of RAM, 200 parallel wget's without an initial cached VNFS.

Will push this change and the change to the default httpd config shortly.

Ben
Reply all
Reply to author
Forward
0 new messages